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GENES FOR THE BIOSYNTHESIS OF EPOTHILONES 

FIELD OF THE INVENTION 

The present invention relates generally to polyketides and genes for their synthesis. 
In particular, the present invention relates to the isolation and characterization of novel poly- 
ke tide synthase and nonribosomal peptide synthetase genes from Sorahgium cellulosum 
that are necessary for the biosynthesis of epothilones A and B. 

BACKGROUND OF THE INVENTION 

Polyketides are compounds synthesized from two-carbon building blocks, the p- 
carbon of which always carries a keto group, thus the name polyketide. These compounds 
induce many important antibiotics, immunosuppressants, cancer chemotherapeutic agents, 
and other compounds possessing a broad range of biological properties. The tremendous 
structural diversity derives from the different lengths of the polyketide chain, the different 
side-chains introduced (either as part of the two-carbon building blocks or after the poly- 
ketide backbone is formed), and the stereochemistry of such groups. The keto groups may 
also be reduced to hydroxyls, enoyls, or removed altogether. Each round of two-carbon 
addition is carried out by a complex of enzymes called the polyketide synthase (PKS) in a 
manner similar to fatty acid biosynthesis. 

The biosynthetic genes for an increasing number of polyketides have been isolated 
and sequenced. For example, see U.S. Patent Nos. 5,639,949, 5,693,774, and 5,716,849, 
all of which are incorporated herein by reference, which describe genes for the biosynthesis 
of soraphen. See a/so, Schupp et a/., FEMS Microbiology Letters 159: 201-207 (1998) and 
WO 98/07868, which describe genes for the biosynthesis of rifamycin, and U.S. Patent No. 
5,876,991 , which describes genes for the biosynthesis of tylactone, all of which are incorpo- 
rated herein by reference. The encoded proteins generally fall into two types: type I and 
type II. Type I proteins are polyfunctions, with several catalytic domains carrying out diffe- 
rent enzymatic steps covalently linked together (e.g. PKS for erythromycin, soraphen, rifa- 
mycin, and avermectin (MacNeil ef a/., in Industrial Microorganisms: Basic and Applied Mo- 
lecular Genetics, (ed.: Baltz et a/.), American Society for Microbiology, Washington D. C. 
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pp. 245-256 (1 993)); whereas type II proteins are mdnofunctlonal (Hutchinson et ai. in 
industrial Microorganisms: Basic and Applied Molecular Genetics, (ed.: Bate et ai.), 
American Society for Microbiology, Washington D. C. pp. 203-21 6 (1 993)). 

For the simpler polyketides such as actinorhodin (produced by Streptomyces 
coelicolor), the several rounds of two-carbon additions are carried out iteratively on PKS 
enzymes encoded by one set of PKS genes. In contrast, synthesis of the more complicated 
compounds such as erythromycin and soraphen involves PKS enzymes that are organized 
into modules, whereby each module carries out one round of two-carbon addition (for re- 
view, see Hopwood etal., in Industrial Microorganisms: Basic and Applied Molecular Gene- 
tics, (ed.: Bate etal.). American Society for Microbiology, Washington D. C, pp. 267-275 
(1993)). , 

Complex polyketides and secondary metabolites in general may contain substructu- 
res that are derived from amino acids instead of simple carboxylic acids. Incorporations of 
these building blocks are accomplished by non-ribosomal polypeptide synthetases 
(NRPSs). NRPSs are multienzymes that are organized in modules. Each module is respon- 
sible for the addition (and the additional processing, if required) of one amino acid building 
block. NRPSs activate amino acids by forming aminoacyl-adenylates, and capture the acti- 
vated amino acids on thiol groups of phophopantheteinyl prosthetic groups on peptidyl car- 
rier protein domains. Further, NRPSs modify the amino acids by epimerization, N-methyia- 
tion. orcyclization if necessary, and catalyse the formation of peptide bonds between the 
enzyme-bound amino acids. NRPSs are responsible for the biosynthesis of peptide secon- 
dary metabolites like cyclosporin, could provide polyketide chain terminator units as in rapa- 
mycin, or form mixed systems with PKSs as in yersiniabactin biosynthesis 

Epothilones A and B are 1 6-membered macrocyclic polyketides with an acylcyste- 
ine-derived starter unit that are produced by the bacterium Sorangium cellule-sum strain So 
ce90 (Gerth et at., J. Antibiotics A9: 560-563 (1996), incorporated herein by reference). The 
structure of epothilone A and B wherein R signifies hydrogen (epdthilone A) or methyl (epo- 
thilone B) is: 
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The epothilones have a narrow antifungal spectrum and especially show a high 
cytotoxicity in animal cell cultures (see, Hofle et a/., Patent DE 4138042 (1993), incorpo- 
rated herein by reference). Of significant importance, epothilones mimic the biological 
effects of taxol, both in vivo and in cultured cells (Bollag et a/., Cancer Research 55: 2325- 
2333 (1995), incorporated herein by reference). Taxol and taxotere, which stabilize cellular 
microtubules, are cancer chemotherapeutic agents with significant activity against various 
human solid tumors (Rowinsky et a/. f J. Natl. Cancer Inst. 83: 1778-1781 (1991)). Competi- 
tion studies have revealed that epothilones act as competitive inhibitors of taxol binding to 
microtubules, consistent with the interpretation that they share the same microtubule-bin- 
ding site and possess a similar microtubule affinity as taxol. However, epothilones enjoy a ' 
significant advantage over taxol in that epothilones exhibit a much lower drop in potency . 
compared to taxol against a multiple drug-resistant cell line (Bollag etal. (1995)). Further- 
more, epothilones are considerably less efficiently exported from the cells by P-giycoprotein 
than is taxol (Gerth et at. (1996)). In addition, several epothilone analogs have been syn- 
thesized that have a superior cytotoxic activity as compared to epothilone A or epothilone B 
as demonstrated by their enhanced ability to induce the polymerization and stabilization of 
microtubules (WO 98/25929, incorporated herein by reference). 

Despite the promise shown by the epothilones as anticancer agents, problems per- 
taining to the production of these compounds presently limit their commercial potential. The 
compounds are too complex for industrial-scale chemical synthesis and so must be produ- 
ced by fermentation. Techniques for the genetic manipulation of myxobacteria such as 
Sorangium cellulosum are described in U.S. Patent No. 5,686,295, incorporated herein by 
reference. However, Sorangium cellulosum is notoriously difficult to ferment and production 
levels of epothilones are therefore low. Recombinant production of epothilones in hetero- 
logous hosts that are more amenable to fermentation could solve current production pro- 
blems. However, the genes that encode the polypeptides responsible for epothilone bio- 
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synthesis have heretofore not been isolated. Furthermore, the strain that produces epo- 
thilones, i.e. So ce90. also produces at least one additional polyketide, spirangien. which 
would be expected to greatly complicate the isolation of the genes particularly responsible 
for epothilone biosynthesis. 

Therefore, in view of the foregoing, one object of the present invention is to isolate 
the genes that are involved in the synthesis of epothilones. particularly the genes that are 
involved in the synthesis of epothilones A and B in mycobacteria of the SorangiunV- 
Polyangium group, i.e., Sorangium cellulosum strain So ce90. A further object of the 
invention is to provide a method for the recombinant production of epothilones for 
application in anticancer formulations. 

SUMMARY OF THE INVENTION 

In furtherance of the aforementioned and other objects, the present invention unex- 
pectedly overcomes the difficulties set forth above to provide for the first time a nucleic acid 
molecule comprising a nucleotide sequence that encodes at least one polypeptide involved 
in the biosynthesis of epothilone. In a preferred embodiment, the nucleotide sequence is 
isolated from a species belonging to Mycobacteria, most preferably Sorangium cellulosum. 

In another preferred embodiment, the present invention provides an isolated nucleic 
acid molecule comprising a nucleotide sequence that encodes at least one polypeptide in 
volved in the biosynthesis of an epothilone, wherein said polypeptide comprises an amino 
acid sequence substantially similar to an amino acid sequence selected from the group con- 
sisting of: SEQ ID NO:2, amino acids 1 1-437 of SEQ ID NO:2, amino acids 543-864 of SEQ 
ID NO:2, amino acids 974-1273 of SEQ ID NO:2, amino acids 1314-1385 of SEQ ID NO:2, 
SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3, 
amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 
549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of 
SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, 
amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 
1 268-1 274 of SEQ ID NO:3, amino acids 1285-1 297 of SEQ ID NO:3, amino acids 973- 
1256 of SEQ ID NO:3, amino acids 1344-1351 of SEQ ID NQ:3, SEQ ID NO:4, amino acids 
7432 of SEQ ID NO:4, amino acids 539-859 of SEQ ID NO:4, amino acids 869-1 037 of 
SEQ ID NO:4, amino acids 1439-1684 of SEQ ID NO:4, amino acids 1722-1792 of SEQ ID 
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NO:4, SEQ ID NO:5, amino acids 39-457 of SEQ ID NQ:5, amino acids 563-884 of SEQ ID 
NO:5, amino acids 1147-1399 of SEQ ID NO:5, amino acids 1434-1506 of SEQ ID NO:5, 
amino acids 1524-1950 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino 
acids 2645-2895 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 
3024-3449 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 3886- 
4048 of SEQ ID NO:5, amino acids 4433-4719 of SEQ ID NO:5, amino acids 4729-4974 of 
SEQ ID NO:5, amino acids 5010-5082 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID 
NO:5. amino acids 5631-5951 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5 f 
amino acids 6542-6837 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID;nO:5. amino 
acids 7140-721 1 of SEQ ID NO:5, SEQ ID NO:6 f amino acids 35-454 of SEQ ID NO:6. 
amino acids 561-881 of SEQ ID NO:6, amino acids 1143-1393 of SEQ ID NO:6, amino 
acids 1430-1503 of SEQ ID NO:6,;amino acids 1522-1946 of SEQ ID NO: 6, amino acids 
2053-2373 of SEQ ID NO:6, amino acids 2383-2551 of SEQ ID NO:6 t amino acids 2671- 
3045 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NQ:6, amino acids 3673-3745 of 
SEQ ID NO:6. SEQ ID NO:7. amino acids 32-450 of SEQ ID NO:7, amino acids 556-877 of 
SEQ ID NO:7, amino acids 887-1051 of SEQ ID NO:7, amino acids 1478-1790 of SEQ ID 
NO:7, amino acids 1810-2055 of SEQ ID NO:7, amino acids 2093-2164 of SEQ ID NO:7, 
amino acids 2165-2439 of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:11, 
and SEQ ID NO:22. 

In a more preferred embodiment, the present invention provides an isolated nucleic 
acid molecule comprising a nucleotide sequence that encodes at least one polypeptide in- 
volved in the biosynthesis of an epothilone, wherein said polypeptide comprises an amino 
acid sequence selected from the group consisting of: SEQ ID NO:2, amino acids 1 1-437 of 
SEQ ID NO:2, amino acids 543-864 of SEQ ID NO:2 f amino acids 974-1273 of SEQ ID 
NO:2, amino acids 1314-1385 of SEQ ID NO:2, SEQ ID NO:3, amino acids 72-81 of SEQ ID 
NO:3, amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3 f amino 
acids 353-363 of SEQ ID NO:3 ( amino acids 549-565 of SEQ ID NO:3, amino acids 588- 
603 of SEQ ID NO:3 f amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ 
ID NO:3 , amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, 
amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino 
acids 1285-1297 of SEQ ID NO:3 r amino acids 973-1256 of SEQ ID NO:3 f amino acids 
1344-1351 of SEQ ID NO:3, SEQ ID NO:4, amino acids 7-432 of SEQ ID NO:4, amino acids 
539-859 of SEQ ID NO:4, amino acids 869-1037 of SEQ ID NO:4, amino acids 1439-1684 
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of SEQ ID NO:4, amino acids 1722-1792 of SEQ ID NO:4. SEQ ID NO:5. amino acids 39* 
457 of SEQ ID NO:5, amino acids 563-884 of SEQ |D NO:5. amino acids 1 147-1399 of 
SEQ ID NO:5, amino acids 1434-1506 of SEQ ID NO:5, amino acids 1524-1 950 of SEQ ID 
NQ:5. amino acids 2056-2377 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, 
amino acids 2932-3005 of SEQ ID NO:5. amino acids 3024-3449 of SEQ ID NO:5. amino 
acids 3555-3876 of SEQ ID NO:5, amino acids 3886-4048 of SEQ ID NO:5. amino acids 
4433-4719 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 5010- 
5082 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 5631-5951 of 
SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID 
NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, 
SEQ ID NO:6. amino acids 35-454 of SEQ ID NO:6. amino acids 561-881 of SEQ ID NO:6, 
amino acids 1143-1393 of SEQ ID NO:6, amino acids 1430-1503 of SEQ ID NO:6. amino 
acids 1522-1 946 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO:6. amino acids 
2383-2551 of SEQ ID NO:6. amino acids 2671-3045 of SEQ ID NO:6, amino acids 3392- 
3636 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6. SEQ ID NO:7, amino acids 
32-450 of SEQ ID NO:7, amino acids 556-877 of SEQ ID NO:7, amino acids 887-1051 of 
SEQ ID NO:7, amino acids 1478-1790 of SEQ ID NO:7. amino acids 1810-2055 of SEQ ID 
NO:7, amino acids 2093-2164 P f SEQ ID NO:7. amino adds 2165-2439 of SEQ ID NO:7. 
SEQ ID NO:8. SEQ ID NO:10, SEQ ID NO:1 1, and SEQ ID NO:22. 

In yet another preferred embodiment, the present invention provides an isolated 
nucleic acid molecule comprising a nucleotide sequence that encodes at least one polypep- 
tide involved in the biosynthesis of an epothilone, Wherein said nucleotide sequence is 
substantially similar to a nucleotide sequence selected from the group consisting of: the 
complement of nucleotides 1 900-3171 of SEQ ID NO:1. nucleotides 3415-5556 of SEQ ID 
NO:1 . nucleotides 761 0-1 1 875 of SEQ ID NO:1 , nucleotides 7643-8920 of SEQ ID NO:1 , 
nucleotides 9236-10201 of SEQ ID NO:1. nucleotides 10529-11428 of SEQ ID NO:1, 
nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID NO:1, 
nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, 
nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:i, 
nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, 
nucleotides 13876-13923 of SEQ ID NO: 1, nucleotides 14313-14334 of SEQ ID NO:1, 
nucleotides 14473-14547 of SEQ ID NO: 1, nucleotides 14578-14607 of SEQ ID NO:1, 
nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, 
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nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1. 
nucleotides 15901-15924 of SEQ ID NO: 1, nucleotides 16251-21749 of SEQ ID NO:1, 
nucleotides 16269-17546 of SEQ ID NOM. nucleotides 17865-18827 of SEQ ID NO:1. 
nucleotides 18855-19361 ol SEQ ID NO:!, nucleotides 20565-21302 of SEQ ID NO:1. 
nucleotides 21414-21626 of SEQ ID NOM. nucleotides 21746-43519 of SEQ ID NO:1, 
nucleotides 21860-231 16 of SEQ ID NO: 1. nucleotides 23431-24397 of SEQ ID NO:1, 
nucleotides 25184-25942 of SEQ ID NOM , nucleotides 26045-26263 of SEQ ID NO:1, 
nucleotides 26318-27595 of SEQ ID NO:1. nucleotides 2791 1-28876 of SEQ ID NO:1. 
nucleotides 29678-30429 of SEQ ID NOM . nucleotides 30539-30759 of SEQ ID NO:1 . 
nucleotides 30815-32092 of SEQ ID NO:!, nucleotides 32408-33373 of SEQ ID NO:1. 
nucleotides 33401 -33889 of SEQ ID NOM . nucleotides 35042-35902 of SEQ ID NO:1 . 
nucleotides 35930-36667of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NOM, 
nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NOM. 
nucleotides 39635-40141 of SEQ ID NO:!, nucleotides 41369-42256 of SEQ ID NO:1, 
nucleotides 4231 4-43048 of SEQ ID NO:!, nucleotides 431 63-43378 of SEQ ID NOM . 
nucleotides 43524-54920 of SEQ ID NO:1. nucleotides 43626-44885 of SEQ ID NOM, 
nucleotides 45204-46166 of SEQ ID NOM, nucleotides 46950-47702 of SEQ ID NO:1, 
nucleotides 4781 1-48032 of SEQ ID NO:1. nucleotides 48087-49361 of SEQ ID NO:!, 
nucleotides 49680-50642 of SEQ ID NO:1 , nucleotides 50670-51 1 76 of SEQ ID NO:1 . 
nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NOM. 
nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO:1. 
nucleotides 55028-56284 of SEQ ID NO:!, nucleotides 56600-57565 of SEQ ID NO:1, 
nucleotides 57593-58087 of SEQ ID NOM, nucleotides 59366-60304 of SEQ ID NOM. 
nucleotides 60362-61 099 of SEQ ID NOM , nucleotides 61 21 1 -61 426 of SEQ \D NOM , 
nucleotides 61427-62254 of SEQ ID NOM, nucleotides 62369-63628 of SEQ ID NOM, 
nucleotides 67334-68251 of SEQ ID NOM , and nucleotides 1-68750 SEQ ID NOM . 

In an especially pref erred embodiment, the present invention provides a nucleic acid 
molecule comprising a nucleotide sequence that encodes at least one polypeptide involved 
in the biosynthesis of an epothilone, wherein said nucleotide sequence is selected from the. 
group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NOM , nucleotides 
3415-5556 of SEQ ID NOM , nucleotides 7610-1 1875 of SEQ ID NOM , nucleotides 7643- 
8920 of SEQ ID NOM, nucleotides 9236-10201 of SEQ ID NOM, nucleotides 10529-11428 
of SEQ ID NOM, nucleotides 11549-11764 of SEQ ID NOM , nucleotides 11872-16104 of 



WO 99/66028 



PCT/EP99/04171 



-8- 

SEQ ID NO:1, nucleotides 12085-12114 of SEQ IDNO:1, nucleotides 12223-12246 of SEQ 
ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID 
NO:1, nucleotides 13516-13566 of SEQ ID NO:1; nucleotides 13633-13680 of SEQ ID 
NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID 
NO:1 f nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID 
NO:1, nucleotides 14623-14692 of SEQ ID NO:1 , nucleotides 15673-15693 of SEQ ID 
NO:1, nucleotides 15724-15762 of SEQ ID NO: 1, nucleotides 14788-15639 of SEQ ID 
NO:1, nucleotides 15901-15924 of SEQ ID NO:1 f nucleotides 16251-21749 of SEQ ID 
NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID 
NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID 
NO:1, nucleotides 21414-21626 of SEQ ID IMO:1, nucleotides 21746-43519 of SEQ ID 
NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID 
NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID 
NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID 
NO: 1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID 
NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID 
NO:1 , nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID 
NO:1 , nucleotides 35930-36667 of SEQ ID NO: 1, nucleotides 36773-36991 of SEQ ID 
NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID 
NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID 
NO:1 , nucleotides 42314-43048 of SEQ ID NO:1 , nucleotides 43163-43378 of SEQ ID 
NO:1 , nucleotides 43524-54920 of SEQ ID NO:1 t nucleotides 43626-44885 of SEQ ID 
NO:i , nucleotides 45204-461 66 of SEQ ID NO:1 , nucleotides 46950-47702 of SEQ ID 
NO:1, nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID 
NO:1 , nucleotides 49680-50642 of SEQ ID NO:1 , nucleotides 50670-51 176 of SEQ ID 
NO:1, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID 
NO:1 , nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID 
NO:1, nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID 
NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID 
NO:1, nucleotides 60362-61099 of SEQ ID NO:1. nucleotides 61211-61426 of SEQ ID 
NO:1, nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID 
NO:1, nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1. 
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ln yet another preferred embodiment, the present invention provides an isolated 
nucleic acid molecule comprising a nucleotide sequence that encodes at least one 
polypeptide involved in the biosynthesis of an epothilone, wherein said nucleotide sequence 
comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide 
portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45. or 50 
(preferably 20) base pair portion of a nucleotide sequence selected from the groupi 
consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415- 
5556 of SEQ ID NO: 1, nucleotides 7610-11875 of SEQ ID NO:!, nucleotides 7643-8920 of 
SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1 , nucleotides 10529-1 1428 of SEQ 
ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID 
NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID 
NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID 
NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID 
NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID 
NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID 
NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID 
NO: 1 , nucleotides 1 5724-1 5762 of SEQ ID NO: 1 , nucleotides 1 4788-1 5639 of SEQ ID 
NO:1, nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of 'SEQ ID 
NO:1, nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 17865-18827 pf SEQ ID 
NOM , nucleotides 18855-19361 of SEQ ID NO:1 , nucleotides 20565-21302 of SEQ ID 
NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID 
NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID 
NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID 
NO: 1, nucleotides 26318-27595 of SEQ ID NO: 1, nucleotides 27911-28876 of SEQ ID 
NO:1 . nucleotides 29678-30429 of SEQ ID NO:1 , nucleotides 30539-30759 of SEQ ID 
NO:1 , nucleotides 30815-32092 of SEQ ID NO:1 , nucleotides 32408-33373 of SEQ ID 
NO:1 , nucleotides 33401-33889 of SEQ ID NO:1 , nucleotides 35042-35902 of SEQ ID 
NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID 
NO:1, nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID 
NO:1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID 
NO:1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID 
NO:1, nucleotides 43524-54920 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID 
NO:1 , nucleotides 45204-46166 of SEQ ID NO:T, nucleotides 46950-47702 of SEQ ID 
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NOM, nucleotides.4781 1-48032 of SEQ ID NOM, nucleotides 48087-49361 of SEQ ID 
NO:1. nucleotides 49680-50642 of SEQ ID NOM, nucleotides 50670-51176 of SEQ ID 
NO:1, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID 
NO: 1, nucleotides 54540-54758 of SEQ ID NOM, nucleotides 54935-62254 of SEQ ID 
NOM , nucleotides 55028-56284 of SEQ ID NOM , nucleotides 56600-57565 of SEQ ID 
NO:1, nucleotides 57593-58087 of SEQ ID NOM, nucleotides 59366-60304 of SEQ ID 
NOM, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 6121 1-61426 of SEQ ID 
NOM, nucleotides 61427-62254 of SEQ ID NOM, nucleotides 62369-63628 of SEQ ID 
NO:1 , nucleotides 67334-68251 of SEQ ID NOM , and nucleotides 1 -68750 SEQ ID NOM . 

The present invention also provides a chimeric gene comprising a heterologous pro- 
moter sequence operatively linked to a nucleic acid molecule of the invention. Further, the 
present invention provides a recombinant vector comprising such a chimeric gene, wherein 
the vector is capable of being stably transformed into a host cell. Still further, the present 
invention provides a recombinant host cell comprising such a chimeric gene, wherein the 
host cell is capable of expressing the nucleotide sequence that encodes at least one poly- 
peptide necessary for the biosynthesis of an epothilone. In a preferred embodiment, the 
recombinant host cell is a bacterium belonging to the order Actinomycetales, and in a more 
preferred embodiment the recombinant host cell is a strain of Streptomyces. In other embo- 
diments, the recombinant host cell is any other bacterium amenable to fermentation, such 
as a pseudomonad or E. coii. Even further, the present invention provides a Bac clone 
comprising a nucleic acid molecule of the invention, preferably Bac clone pEPOl 5. 

In another aspect, the present invention provides an isolated nucleic acid molecule 
comprising a nucleotide sequence that encodes an epothilone synthase domain. 

According to one embodiment, the epothilone synthase domain is a 6-ketoacyl-syn- 
thase (KS) domain comprising an amino acid sequence substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO:2, 
amino acids 7-432 of SEQ ID NQ:4, amino adds 39-457 of SEQ ID NO:5, amino acids 
1 524-1 950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 51 03- 
5525 of SEQ ID NO:5, amino acids 35-454 of SEQ ID NO:6, amino acids 1522-1946 of 
SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7. According to this embodiment, 
said KS domain preferably comprises an amino acid sequence selected from the group 
consisting of: amino acids 11-437 of SEQ ID NO:2, amino acids 7-432 of SEQ ID NO:4, 
amino acids 39-457 of SEQ ID NO:5. amino acids 1524-1950 of SEQ ID NO:5, amino acids 
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3024-3449 of SEQ ID NO:5 f amino acids 5103-5525 of SEQ ID NO:5, amino acids 35-454 
of SEQ ID NO:6, amino acids 1522-1 946 of SEQ ID NO: 6 f and amino acids 32-450 of SEQ 
ID NO:7. Also, according to this embodiment, said nucleotide sequence preferably is 
substantially similar to a nucleotide sequence selected from the group consisting of: nucleo- 
tides 7643-8920 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 
21860-231 16 of SEQ ID NO:1 , nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 
3081 5-32092 of SEQ ID NO:1 , nucleotides 37052-38320 of SEQ ID NO:1 , nucleotides 
43626-44885 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, and nucleotides 
55028^56284 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence 
more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base 
pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 
45, or 50 (preferably 20) base pair portion of a nucleotide sequence selected from the 
group consisting of: nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 16269-17546 of 
SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ 
ID NO:1. nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID 
NO:1 , nucleotides 43626-44885 of SEQ ID NO:1 , nucleotides 48087-49361 of SEQ ID 
NO:1 , and nucleotides 55028-56284 of SEQ ID NO:1 . In addition, according to this embo- 
diment, said nucleotide sequence most preferably is selected from the group consisting of: 
nucleotides 7643-8920 of SEQ ID NO: 1, nucleotides 16269-17546 of SEQ ID NO:1, nucle- 
otides 21860-231 16 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleo- 
tides 30815-32092 of SEQ ID NO: 1, nucleotides 37052-38320 of SEQ ID NO: 1, nucleotides 
43626-44885 of SEQ ID NO:1 , nucleotides 48087-49361 of SEQ ID NO:1 , and nucleotides 
55028-56284 of SEQ ID NO:1. 

According to another embodiment, the epothilone synthase domain is an acyltrans- 
f erase (AT) dorhain comprising an amino acid sequence substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO:2, 
amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ ID NO:5, amino acids 
2056-2377 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 5631- 
5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID NO:6, amino acids 2053-2373 of 
SEQ ID NO:6, and amino acids 556-877 of SEQ ID NO:7. According to this embodiment, 
said AT domain preferably comprises an amino acid sequence selected from the group 
consisting of: amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ IP NO:4 f 
amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino 
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acids 3555-3876 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5. amino acids 
561-881 of SEQ ID NO:6. amino acids 2053-2373 of SEQ ID NO:6. and amino acids 556- 
877 of SEQ ID NO:7. Also, according to this embodiment, said nucleotide sequence pre- 
ferably is substantially similar to a nucleotide sequence selected from the group consisting 
of: nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO T 
nucleotides 23431-24397 of SEQ ID NO:1 , nucleotides 2791 1 -28876 of SEQ ID NO'1 ' 
nucleotides 32408-33373 of SEQ ID NO:1 , nucleotides 38636-39598 of SEQ ID NO'1 ' 
nucleotides 45204-46166 of SEQ ©jNOrl, nucleotides 49680-50642 of SEQ ID NO" / and 
nucleotides 56600-57565 of SEQ ID NO: 1. According to this embodiment, said nucleotide 
sequence more preferably comprises a consecutive 20. 25. 30. 35. 40. 45. or 50 (preferably 
20) base pair nucleotide portion identical in sequence to a respective consecutive 20 25 
30, 35. 40. 45. or 50 (preferably 20) base pair portion of a nucleotide sequence selected ' 
from the group consisting of: nucleotides 9236-10201 of SEQ ID NO:1. nucleotides 1 7865- 

18827 of SEQ ID NO:1. nucleotides 23431-24397 of SEQ ID NO.-1. nucleotides 27911- 
28876 of SEQ ID NO:1. nucleotides 32408-33373 of SEQ ID NO:1. nucleotides 38636- 
39598 of SEQ ID NO:1. nucleotides 45204-46166 of SEQ ID NO:1. nucleotides 49680- 
50642 of SEQ ID NO: 1 . and nucleotides 56600-57565 of SEQ ID NO:1 . In addition, accor- 
d,ng to this embodiment, said nucleotide sequence most preferably is selected from the 
group consisting of: nucleotides 9236-10201 of SEQ ID NO:1. nucleotides 17865-18827 of 
SEQ ID NQ:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ 
ID NO:1. nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID 
NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID 
NO:1. and nucleotides 56600-57565 of SEQ ID NO:1. 

According to still another embodiment, the epothilone synthase domain is an enoyl 
reductase (ER) domain comprising an amino acid sequence substantially similar to an ami- 
no acd sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID 
NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO'S 
and amino acids 1478-1790 of SEQ ID NO:7. According to this embodiment, said ER da- 
man preferably comprises an amino acid sequence selected from the group consisting of 
amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino 
acids 6542-6837 of SEQ ID NO:5, and amino adds 1478-1790 of SEQ ID NO:7. Also, ac- 
cording to this embodiment, said nucleotide sequence preferably is substantially similar to a 
nucleotide sequence selected from the group consisting of: nucleotides 10529-1 1428 of 
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SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ 
ID NO:1, and nucleotides 59366^60304 of SEQ ID NO:1 . According to this embodiment, 
said nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, 
or 50 (preferably 20) base pair nucleotide portion identical in sequence to a respective 
consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair portion of a nucleotide 
sequence selected from the group consisting of: nucleotides 10529-11428 of SEQ ID NO:1, 
nucleotides 35042-35902 of SEQ ID NO: 1, nucleotides 41369-42256 of SEQ ID NO:1, and 
nucleotides 59366-60304 of SEQ ID NO:!. In addition, according to this embodiment, said 
nucleotide sequence most preferably is selected from the group consisting of: nucleotides 
10529-11428 of SEQ ID-NO:!, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 
41369-42256 of SEQ ID NO:1, and nucleotides .59366-60304 of SEQ ID NO:1. 

According to another embodiment, the epothilone synthase domain is an acyl carrier 
protein (ACP) domain, wherein said polypeptide comprises an amino acid sequence 
substantially similar to an amino acid sequence selected from the group consisting of: 
amino adds 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino 
acids 1434-1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acidsi 
5010-5082 of SEQ ID NO:5, amino acids 7140-721 1 of SEQ ID NO:5, amino acids 1430- ' 
1503 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093- 
21 64 of SEQ ID NO:7. According to this embodiment, said ACP domain preferably 
comprises an amino acid sequence selected from the group consisting of: amino acids 
1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4. amino acids 1434- 
1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010-5082 of 
SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NQ:5, amino acids 1430-1503 of SEQ ID 
NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093-21 64 of SEQ ID 
NO:7. Also, according to this embodiment, said nucleotide sequence preferably is substan- 
tially similar to a nucleotide sequence selected from the group consisting of: nucleotides 
1 1549-1 1764 of SEQ ID NO:!, nucleotides 21414-21626 of SEQ ID NO;1 , nucleotides 
26045-26263 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1 , nucleotides 
36773-36991 of SEQ ID NO:1 , nucleotides 431 63-43378 of SEQ ID NQ:1 ; nucleotides 
4781 1-48032 of SEQ ID NO:1 , nucleotides 54540-54758 of SEQ ID NO:1, and nucleotides 
61211-61426 of SEQ ID NO:!. According to this embodiment, said nucleotide sequence 
more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base 
pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 



WO 99/66028 



PCT/EP99/04171 



-14- 

45, or 50 (preferably 20) base pair portion of a nucleotide sequence selected from the 
group consisting Of: nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 21414-21626 of 
SEQ ID NO:1 , nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ 
ID NO:1 , nucleotides 36773-36991 of SEQ ID NO:1 . nucleotides 43163-43378 of SEQ ID 
NO:1 , nucleotides 4781 1-48032 of SEQ ID NO:1 , nucleotides 54540-54758 of SEQ ID 
NO:1, and nucleotides 6121 1-61426 of SEQ ID NO:1. In addition, according to this embodi- 
ment, said nucleotide sequence most preferably is selected from the group consisting of: 
nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1. 
nucleotides 26045-26263 of SEQ ID NO:1 , nucleotides 30539-30759 of SEQ ID NO:1 . 
nucleotides 36773-36991 of SEQ ID NO:1 ; nucleotides 43163-43378 of SEQ ID NO:1. 
nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1. and 
nucleotides 61211-61426 of SEQ ID NO:1. 

According to another embodiment, the epothilone synthase domain is a dehydratase 
(DH) domain comprising an amino acid sequence substantially similar to an amino acid se- 
quence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO:4, ami- 
no acids 3886-4048 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5. amino acids 
2383-2551 of SEQ ID NO:6, and amino acids 887-1051 of SEQ ID NO:7. According to this 
embodiment, said DH domain preferably comprises an amino acid sequence selected from 
the group consisting of: amino acids 869-1037 Of SEQ ID NO:4, amino acids 3886-4048 of 
SEQ ID NO:5. amino acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID 
NO:6, and amino acids 887-1051 of SEQ ID NO:7. Also, according to this embodiment, said 
nucleotide sequence preferably is substantially similar to a nucleotide sequence selected 
from the group consisting of: nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 33401- 
33889 of SEQ ID NO:1 , nucleotides 39635-401 41 of SEQ ID NO:1 . nucleotides 50670- 
51176 of SEQ ID NO:1. and nucleotides 57593-58087 of SEQ ID NO:1. According to this 
embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 25, 
30, 35, 40, 45. or 50 (preferably 20) base pair nucleotide portion identical in sequence to a 
respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair portion of a 
nucleotide sequence selected from the group consisting of: nucleotides 1 8855-1 9361 of 
SEQ ID NO:1 , nucleotides 33401 -33889 of SEQ ID NO:1 , nucleotides 39635-401 41 of SEQ 
ID NO:1, nucleotides 50670-51 176 of SEQ ID NO:1, and nucleotides 57593-58087 of SEQ 
ID NO:1 . In addition, according to this embodiment, said nucleotide sequence most pre- 
ferably is selected from the group consisting of: nucleotides 18855-19361 of SEQ ID NO:1, 
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nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, . 
nucleotides 50670-51176 of SEQ ID NO:1, and nucleotides 57593-58087 of SEQ ID NO:1; 

According to yet another embodiment, the epothilone synthase domain is a p-keto- 
reductase (KR) domain comprising an amino acid sequence substantially similar to an ami- 
no acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID 
NO:4, amino acids 1147-1399 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, 
amino acids 4729-4974 of SEQ ID NO:5 f amino acids 6857-7101 of SEQ ID NO:5, amino 
acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, and amino 
acids 1810-2055 of SEQ ID NO:7. According to this embodiment, said KR domain pre- 
ferably comprises an aminq acid sequence selected from the group consisting of: amino 
acids 1439-1684 of SEQ ID NO:4, amino acids 1147-1399 of SEQ ID NO:5, amino acids 
2645-2895 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857- 
7101 of SEQ ID NO:5, amino acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of 
SEQ ID NO:6, and amino acids 1810-2055 of SEQ ID NO:7. Also, according to this embo- 
diment, said nucleotide sequence preferably is substantially similar to a nucleotide sequen- 
ce selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO:1, nucle- 
otides 25184-25942 of SEQ ID NO:1 , nucleotides 29678-30429 of SEQ ID NO:1, nucleo- ' 
tides 35930-36667 of SEQ ID NO:T, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 
46950-47702 of SEQ ID NOM , nucleotides 53697-5443iof SEQ ID NO:1, and nucleotides 
60362-61099 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence 
more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base 
pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 
45, or 50 (preferably 20) base pair portion of a nucleotide sequence selected from the 
group consisting of: nucleotides 20565-21302 of SEQ ID NO;1, nucleotides 25184-25942 of 
SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1 , nucleotides 35930-36667 of SEQ 
ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID 
NO:1, nucleotides 53697-54431 of SEQ ID NO:1, and nucleotides 60362-61099 of SEQ ID 
NO:1. In addition, according to this embodiment, said nucleotide sequence most preferably 
is selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO:1, nucle- 
otides 25184-25942 of SEQ ID NO:1 f nucleotides 29678-30429 of SEQ ID NO:1 t nucleo- 
tides 35930-36667 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO: 1, nucleotides 
46950-47702 of SEQ ID NO:1 , nucleotides 53697-54431 of SEQ ID NO:1, and nucleotides 
60362-61099 of SEQ ID NQ:1. 
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According to an additional embodiment, the epothilone synthase domain is a 
methyltransf erase (MT) domain comprising an amino acid sequence substantially similar to 
amino acids 2671-3045 of SEQ ID NO:6. According to this embodiment, said MT domain 
preferably comprises amino acids 2671-3045 of SEQ ID NO:6. Also, according to this 
embodiment, said nucleotide sequence preferably is substantially similar to nucleotides 
51534-52657 of SEQ ID NO: 1. According to this embodiment, said nucleotide sequence 
more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base 
pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 
45, or 50 (preferably 20) base pair portion of nucleotides 51534-52657 of SEQ ID NO:1. In 
addition, according to this embodiment, said nucleotide sequence most preferably is nucleo- 
tides 51534-52657 of SEQ ID NO:1. 

According to another embodiment, the epothilone synthase domain is a thioesterase 
(TE) domain comprising an amino acid sequence substantially similar to amino acids 2165- 
2439 of SEQ ID NO:7. According to this embodiment, said TE domain preferably comprises 
amino acids 2165-2439 of SEQ ID NO:7. Also, according to this embodiment, said nucleo- 
tide sequence preferably is substantially similar to nucleotides 61427-62254 of SEQ ID 
NO:1 . According to this embodiment, said nucleotide sequence more preferably comprises 
a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion iden- 
tical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) 
base pair portion of nucleotides 61 427-62254 of SEQ ID NO:1 . In addition, according to this 
embodiment, said nucleotide sequence most preferably is nucleotides 61427-62254 of SEQ 
ID NO:1. 

In still another aspect, the present invention provides an isolated nucleic acid mole- 
cule comprising a nucleotide sequence that encodes a non-ribosomal peptide synthetase, 
wherein said non-ribosomal peptide synthetase comprises an amino acid sequence 
substantially similar to ah amino acid sequence selected from the group consisting of: SEQ 
ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 1 18-125 of SEQ ID NO:3, amino 
acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 549- 
565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ 
ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, 
amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 
1 268-1 274 of SEQ ID NO:3, amino acids 1 285-1 297 of SEQ ID NO:3, amino acids 973- 
1256 of SEQ ID NO:3, and amino acids 1344-1351 of SEQ ID NO:3. According to this 
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embodiment, said non-ribosomal peptide synthetase preferably comprises an amino acid 
sequence selected from the group consisting of: SEQ ID NO:3 t amino acids 72-81 of SEQ 
ID NO:3, amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, 
amino acids 353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 
588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3 f amino acids 815-821 of 
SEQ ID NO:3, amino acids 868-892 of SEQ ID NQ:3, amino acids 903-912 of SEQ ID NO;3, 
amino acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino 
acids 1285-1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, and amino acids 
1344-1351 of SEQ ID NO:3. Also, according to this embodiment, said nucleotide sequence 
preferably is substantially similar to a nucleotide sequence selected from the group con- 
sisting of: nucleotides'1 1872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID 
NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID 
NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID 
NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID 
NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID 
NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID 
NO:1 , nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID 
NO:1, nucleotides 14788-15639 of SEQ ID NO:1, and nucleotides 15901-15924 of SEQ ID 
NO:1 . According to this embodiment, said nucleotide sequence more preferably comprises 
a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion 
identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 
20) base pair portion of a nucleotide sequence selected from the group consisting of: nucle- 
otides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleo- 
tides 12223-12246 of SEQ ID NO: 1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 
12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 
13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 
14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 
14578-14607 of SEQ ID NO:1 f nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 
15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1 t nucleotides 
14788-15639 of SEQ ID NO:1, and nucleotides 15901-15924 of SEQ ID NO:1 . In addition, 
according to this embodiment, sajd nucleotide sequence most preferably is selected from 
the group consisting of: nucleotides 11872-16104 of SEQ ID NO:1 ( nucleotides 12085- 
12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO: 1, nucleotides 12466- 
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12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1 , nucleotides 13516- 
13566 of SEQ ID NO:1. nucleotides 13633-13680 of SEQ ID NO:1. nucleotides 13876- 
13923 of SEQ ID NO:1, nucleotides 1431 3-1 4334 of SEQ ID NO:1, nucleotides 14473- 
14547 of SEQ ID NO: 1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623- 
14692 of SEQ ID NO:1. nucleotides 15673-15693 of SEQ ID NO:1 , nucleotides 15724- 
15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1 , and nucleotides 15901- 
15924 of SEQ ID NO:1. 

The present invention further provides an isolated nucleic acid molecule comprising 
a nucleotide sequence that encodes a polypeptide comprising an amino acid sequence 
selected from the group consisting of SEQ ID NOs:2-23. 

In accordance with another aspect, the present invention also provides methods for 
the recombinant production of polyketides such as epothilones in quantities large enough to 
enable their purification and use in pharmaceutical formulations such as those for the treat- 
ment of cancer. A specific advantage of these prdduction methods is the chirality of the 
molecules produced; production in transgenic organisms avoids the generation of popu- 
lations of racemic mixtures, within which some enantiomers may have reduced activity. In 
particular, the present invention provides a method for heterologous expression of epothi- 
lone in a recombinant host, comprising: (a) introducing into a host a chimeric gene compri- 
sing a heterologous promoter sequence operatively linked to a nucleic acid molecule of the 
invention that comprises a nucleotide sequence that encodes at least one polypeptide in- 
volved in the biosynthesis of epothilone; and (b) growing the host in conditions that allow 
biosynthesis of epothilone in the host. The present invention also provides a method for 

producing epothilone, comprising: (a) expressing epothilone in a recombinant host by the 
aforementioned method; and (b) extracting epothilone from the recombinant host. 

According to still another aspect, the present invention provides an isolated polypep- 
tide comprising an amino acid sequence that consists of an epothilone synthase domain. 

According to one embodiment, the epothilone synthase domain is a p-ketoacyl- 
synthase (KS) domain comprising an amino acid sequence substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 1 1-437 of SEQ ID NO:2, 
amino acids 7-432 of SEQ ID Nd:4, amino acids 39-457 of SEQ ID NO:5, amino acids 
1524-1950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 5103- 
5525 of SEQ ID NO:5. amino acids 35-454 of SEQ ID NO:6, amino acids 1522-1946 of 
SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7. According to this embodiment. 
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said KS domain preferably comprises an amino acid sequence selected from the group 
consisting of: amino acids 1 1-437 of SEQ ID NO:2, amino acids 7-432 of SEQ ID NO:4, 
amino acids 39-457 of SEQ ID NO:5, amino acids 1 524-1 950 of SEQ ID NO:5 t amino acids 
3024-3449 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 35-454 
of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ 
IDNO:7. 

According to another embodiment, the epothilone synthase domain is an acyltrans- 
ferase (AT) domain comprising an amino acid sequence, substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO:2, 
amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ ID NO:5, amino acids 
2056-2377 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 5631- 
5951 of SEQ ID NO:5 r amino acids 561-881 of SEQ ID NO:6, amino acids 2053-2373 of 
SEQ ID NO:6, and amino acids 556-877 of SEQ ID NO:7. According to this embodiment, 
said AT domain preferably comprises an amino acid sequence selected from the group 
consisting of: amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, 
amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino 
acids 3555-3876 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids ' 
561-881 of SEQ ID NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556- 
877 of SEQ ID NO:7. 

According to still another embodiment, the epothilone synthase domain is an enoyl 
reductase (ER) domain comprising an amino acid sequence substantially similar to an ami- 
no acid sequence selected from the group consisting of: amino adds 974-1273 of SEQ ID 
NO:2, amino acids 4433-471 9 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, 
and amino acids: 1478-1 790 of SEQ ID NO:7. According to this embodiment, said ER do- 
main preferably comprises an amino acid sequence selected from the group consisting of: 
amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino 
acids 6542-6837 of SEQ ID NO:5, and amino acids 1 478-1 790 of SEQ ID NO:7. 

According to another embodiment, the epothilone synthase domain is an acyl carrier 
protein (ACP) domain, wherein said polypeptide comprises an amino acid sequence 
substantially similar to an amino acid sequence selected from the group consisting of: ami- 
no acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 
1434-1506 of SEQ ID NQ:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010- 
5082 of SEQ ID NO:5, amino acids 7140-721 1 of SEQ ID NO:5, amino acids 1430-1503 of 
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SEQ ID NO:6. amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of 
SEQ ID NO:7. According to this embodiment, said ACP domain preferably comprises an 
amino acid sequence selected from the group consisting of: amino adds 1314-1385 of SEQ 
ID NO:2. amino acids 1722-1792 of SEQ ID NO:4, amino acids 1434-1506 of SEQ ID NO:5. 
amino acids 2932-3005 of SEQ ID NO:5. amino acids 5010-5082 of SEQ ID NO:5, amino 
acids 7140-7211 of SEQ ID NO:5. amino acids 1430-1503 of SEQ ID NO:6. amino acids 
3673-3745 of SEQ ID NO:6, and amino adds 2093-2164 of SEQ ID NO:7. 

According to another embodiment, the epothilone synthase domain is a dehydratase 
(DH) domain comprising an amino acid sequence substantially similar to an amino acid se- 
quence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO:4, ami- 
no acids 3886-4048 of SEQ ID NO:5. amino acids 5964-6132 of SEQ ID NO:5, amino acids 
2383-2551 of SEQ ID NO:6. and amino acids 887-1051 of SEQ ID NO:7. According to this 
embodiment, said DH domain preferably comprises an amino acid sequence selected from 
the group consisting of: amino acids 869-1037 of SEQ ID NO:4. amino acids 3886-4048 of 
SEQ ID NO:5. amino acids 5964-6132 of SEQ ID NO:5. amino acids 2383-2551 of SEQ ID 
NO:6, and amino acids 887-1051 of SEQ ID NO:7. 

According to yet another embodiment, the epothilone synthase domain is a p-keto- 
reductase (KR) domain comprising an amino acid sequence substantially similar to an ami- 
no acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID 
NO:4, amino acids 1 147-1399 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, 
amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino 
acids 1143-1393 of SEQ ID NO:6. amino acids 3392-3636 of SEQ ID NO:6, and amino 
adds 1810-2055 of SEQ ID NO:7. According to this embodiment, said KR domain prefer- 
ably comprises an amino acid sequence selected from the group consisting of: amino acids 
1439-1684 of SEQ ID NO:4, amino acids 1147-1399 of SEQ ID NO:5, amino acids 2645- 
2895 of SEQ ID NO:5. amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857-7101 of 
SEQ ID NO:5, amino adds 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID 
NO:6, and amino acids 1810-2055 of SEQ ID NO:7. 

According to an additional embodiment, the epothilone synthase domain is a methyl- 
transferase (MT) domain comprising an amino acid sequence substantially similar to amino 
acids 2671-3045 of SEQ ID NO:6. According to this embodiment, said MT domain preferab- 
ly comprises amino acids 2671-3045 of SEQ ID NO:6. 
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According to another embodiment, the epothilone synthase domain is a thioesterase 
(TE) domain comprising an amino acid sequence substantially similar to amino acids 2165- 
2439 of SEQ ID NO:7. According to this embodiment, said TE domain preferably comprises 
amino acids 21 65-2439 of SEQ ID NO:7. 

Other aspects and advantages of the present invention will become apparent to those 
skilled in the art from a study of the following description of the invention and non-limiting 
examples. 

DEFINITIONS 

In describing the present invention, the following terms will be employed, and are 
intended to be defined as indicated below. 

Associated With / Operatively Linked: Refers to two DNA sequences that are related 
physically or functionally. For example, a promoter or regulatory DNA sequence is said to 
be "associated with" a DNA sequence that codes for an RNA or a protein if the two sequen- 
ces are operatively linked, or situated such that the regulator DNA sequence will affect the 
expression level of the coding or structural DNA sequence. 

Chimeric Gene: A recombinant DNA sequence in which a promoter or regulatory 
DNA sequence is operatively linked to, or associated with, a DNA sequence that codes for 
an mRNA or which is expressed as a protein, such that the regulator DNA sequence is able 
to regulate transcription or expression of the associated DNA sequence. The regulator 
DNA sequence of the chimeric gene is not normally operatively linked to the associated 
DNA sequence as found in nature. 

Coding DNA Sequence: A DNA sequence that is translated in an organism to pro- 
duce a protein. 

Domain: That part of a polyketide synthase necessary for a given distinct activity. 
Examples include acyl carrier protein (ACP), p-ketosynthase (KS), acyltransf erase (AT), (J- 
ketoreductase (KR), dehydratase (DH), enoylreductase (ER), and thioesterase (TE) 
domains. 

Epothilbnes: 16-membered macrocyclic polyketides naturally produced by the bacte- 
rium Sorangium cellulosum strain So ce90, which mimic the biological effects of taxoL In 
this application, "epothilone" refers to the class of polyketides that includes epothilone A 
and epothilone B, as well as analogs thereof such as those described in WO 98/25929. 
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Epothilone Synthase: A polyketide synthase responsible for the biosynthesis of epo- 

thilone. 

Gene: A defined region that is located within a genome and that, besides the afore- 
mentioned coding DNA sequence, comprises other, primarily regulatory, DNA sequences 
responsible for the control of the expression, that is to say the transcription and translation, 
of the coding portion. 

Heterologous DNA Sequence: A DNA sequence not naturally associated with a host 
cell into which it is introduced, including non-naturally occurring multiple copies of a natu- 
rally occurring DNA sequence. 

Homologous DNA Sequence: A DNA Sequence naturally associated with a host cell 
into which it is introduced. 

Homologous Recombination: Reciprocal exchange of DNA fragments between 
homologous DNA molecules. 

Isolated: In the context of the present invention, an isolated nucleic acid molecule or 
an isolated enzyme is a nucleic acid molecule or enzyme that, by the hand of man, exists 
apart from its native environment and is therefore not a product of nature. An isolated 
nucleic acid molecule or enzyme may exist in a purified form or may exist in a non-native ' 
environment such as, for example, a recombinant host cell. 

Module: A genetic element encoding all of the distinct activities required in a single 
round of polyketide biosynthesis, i.e., one condensation step and all the p-carbonyl pro- 
cessing steps associated therewith. Each module encodes an ACP, a KS, and an AT 
activity to accomplish the condensation portion of the biosynthesis, and selected post- 
condensation activities to effect the p-carbonyl processing. 

NRPS: A non-ribosomal polypeptide synthetase, which is a complex of enzymatic 
activities responsible for the incorporation of amino acids into secondary metabolites in- 
cluding, for example, amino acid adenylation, epimerization, N-methylation, cyclization, 
peptidyl carrier protein, and condensation domains. A functional NRPS is one that cata- 
lyzes the incorporation of an amino acid into a secondary metabolite. 

NRPS gene: One or more genes encoding NRPSs for producing functional secon- 
dary metabolites, e.g., epothilones A and B, when under the direction of one or more com- 
patible control elements. 
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Nucleic Acid Molecule: A linear segment of single- or double-stranded DNA or RNA 
that can be isolated from any source. In the context of the present invention, the nucleic 
acid molecule is preferably a segment of DNA. 

ORF: Open Reading Frame. 

PKS: A polyketide synthase, which is a complex of enzymatic activities (domains) 
responsible for the biosynthesis of polyketides including, for example, ketoreductase, dehy- 
dratase, acyl carrier protein, enoylreductase, ketoacyl ACP synthase, and acyltransf erase! 
A functional PKS is one that catalyzes the synthesis of a polyketide. 

PKS Genes: One or more genes encoding various polypeptides required for produ- 
cing functional polyketides, e.g., epothilones A and B, when under the direction of one or 
more compatible control elements. 

Substantially Similar With respect to nucleic acids, a nucleic acid molecule that has 
at least 60 percent sequence identity with a reference nucleic acid molecule. In a preferred 
embodiment, a substantially similar DNA sequence is at least 80% identical to a reference 
DNA sequence: in a more preferred embodiment, a substantially similar DNA sequence is at 
least 90% identical to a reference DNA sequence; and in a most preferred embodiment, a 
substantially similar DNA sequence is at least 95% identical to a reference DNA sequence. 
A substantially similar DNA sequence preferably encodes a protein or peptide having 
substantially the same activity as the protein or peptide encoded by the reference DNA: 
sequence. A substantially similar nucleotide sequence typically hybridizes to a reference 
nucleic acid molecule, or fragments thereof, under the following conditions: hybridization at 
7% sodium dodecyl sulfate (SDS), 0.5 M NaP0 4 pH 7.0, 1 mM EDTA at 50°C; wash with 2X 
SSC, 1% SDS, at 50°C. With respect to proteins or peptides, a substantially similar amino 
acid sequence is an amino acid sequence that is at least 90% identical to the amino acid 
sequence of a reference protein or peptide and has substantially the same activity as the 
reference protein or peptide. 

Transformation: A process for introducing heterologous nucleic acid into a host cell 
or organism. 

Transformed / Transgenic / Recombinant: Refers to a host organism such as a bac- 
terium into which a heterologous nucleic acid molecule has been introduced. The nucleic 
acid molecule can be stably integrated into the genome of the host or the nucleic acid mo- 
lecule can also be present as an extrachromosomal molecule. Such an extrachromosomal 
molecule can be auto-replicating. Transformed cells, tissues, or plants are understood to 
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encompass not only the end product of a transformation process, but also transgenic pro- 
geny thereof. A "non-transformed'*, "non-transgenic", or "non-recombinant" host refers to a 
wild-type organism, i.e., a bacterium, which does not contain the heterologous nucleic acid 
molecule. 

Nucleotides are indicated by their bases by the following standard abbreviations: 
adenine (A), cytosine (C), thymine (T), and guanine (G). Amino acids are likewise indicated 
by the following standard abbreviations: alanine (ala; A), arginine (Arg; R), asparagine (Asn; 
N), aspartic acid (Asp; D), cysteine (Cys; C), glutamine (Gin; Q), glutamic acid (Glu; E), 
glycine (Gly; G), histidine (His; H), isoleucine (lie; I), leucine (Leu; L), lysine (lys; K), 
methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine 
(Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V). Furthermore, (Xaa; X) 
represents any amino acid. 

DESCRIPTION OF THE SEQUENCES IN THE SEQUENCE LISTING 

SEQ ID NO:1 is the nucleotide sequence of a 68750 bp contig containing 22 open 
reading frames (ORFs), which comprises the epothilone biosynthesis genes. 

SEQ ID NO:2 is the protein sequence of a type I polyketide synthase (EPOS A) 
encoded by epoA (nucleotides 7610-1 1875 of SEQ ID NO:1). 

SEQ ID NO:3 is the protein sequence of a non-ribosomal peptide synthetase (EPOS 
P) encoded by epoP (nucleotides 11872-16104 of SEQ ID NO:1). 

SEQ ID NO:4 is the protein sequence of a type I polyketide synthase (EPOS B) 
encoded by epoB (nucleotides 16251-21749 of SEQ ID NO:1) 

SEQ ID NO:5 is the protein sequence of a type I polyketide synthase (EPOS C) 
encoded by epoC (nucleotides 21 746-4351 9 of SEQ ID NO: 1 ). 

SEQ ID NO:6 is the protein sequence of a type I polyketide synthase (EPOS D) 
encoded by epoD (nucleotides 43524-54920 of SEQ ID NO:1). 

SEQ ID NO:7 is the protein sequence of a type I polyketide synthase (EPOS E) 
encoded by epoE (nucleotides 54935-62254 of SEQ ID NO:1). 

SEQ ID NO:8 is the protein sequence of a cytochrome P450 oxygenase homologue 
(EPOS F) encoded by epoF (nucleotides 62369-63628 of SEQ ID NO:1). 

SEQ ID NO:9 is a partial protein sequence (partial Orf 1) encoded by orft (nucleotides 
1-1826 of SEQ ID NO:1). 
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SEQ ID NO:10 is a protein sequence (Orf 2) encoded by orC (nucleotides 3171-1900 
on the reverse complement strand of SEQ ID NO:i ). 

SEQ ID NO:1 1 is a protein sequence (Orf 3) encoded by orf3 (nucleotides 3415-5556 
of SEQ ID NO: 1). 

SEQ ID NO:12 is a protein sequence (Orf 4) encoded by orf4 (nucleotides 5992-5612 
on the reverse complement strand of SEQ ID NO:1). 

SEQ ID NO:13 is a protein sequence (Orf 5) encoded by ort5 (nucleotides 6226-6675 
of SEQ ID NO:1). 

SEQ ID NO:1 4 is a protein sequence (Orf 6) encoded by orfB (nucleotides 63779- 
64333 of SEQ ID NO:1). 

SEQ ID NO:15 is a protein sequence (Orf 7) encoded by orff (nucleotides 64290- 
63853 oh the reverse complement strand of SEQ ID NO:1). 

SEQ ID NO:16 is a protein sequence (Orf 8) encoded by orfo (nucleotides 64363- 
64920 of SEQ !DNO:1). 

SEQ ID NO:1 7 is a protein sequence (Orf 9) encoded by or/9 (nucleotides 64727- 
64287 on the reverse complement strand of SEQ ID NO: 1). 

SEQ ID NO:18 is a protein sequence (Orf 10) encoded by or/10 (nucleotides 65063^ 
65767 of SEQ ID NO:1). 

SEQ ID NO:19 is a protein sequence (Orf 1 1) encoded by orfl'1 (nucleotides 65874-. 
65008 on the reverse complement strand of SEQ ID NO:1). 

SEQ ID NO:20 is a protein sequence (Orf 12) encoded by orf12 (nucleotides 66338- 
65871 on the reverse complement strand of SEQ ID NO:1). 

SEQ ID NO:21 is a protein sequence (Orf 13) encoded by drf13 (nucleotides 66667- 
67137 of SEQ ID NO:1). 

SEQ ID NO:22 is a protein sequence (Orf 14) encoded by o/f14 (nucleotides 67334- 
68251 of SEQ ID NO:1). 

SEQ ID NO:23 is a partial protein sequence (partial Orf 15) encoded by 6rfl5 
(nucleotides 68346-68750 of SEQ ID NO:1). 

SEQ ID NO:24 is the universal reverse PCR primer sequence. 
SEQ ID NO:25 is the universal fonward PCR primer sequence. 
SEQ ID NO:26 is the NH24 end M B- PCR primer sequence, 
SEQ ID NO:27 is the NH2 end "A" PCR primer sequence. 
SEQ ID NO:28 is the NH2 end U B" PCR primer sequence. 
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SEQ ID NO:29 is the PEP015-NH6 end "B" PCR primer sequence. 
SEQ ID NO:30 is the pEP015-H2.7 end "A" PCR primer sequence. 

DEPOSIT INFORMATION 

The following material has been deposited with the Agricultural Research Service. 
Patent Culture Collection (NRRL), 1815 North University Street, Peoria, Illinois 61604, under 
the Budapest Treaty on the International Recognition of the Deposit of Microorganisms for 
the Purposes of Patent Procedure. All restrictions on the availability of the deposited 
material will be irrevocably removed upon the granting of a patent. 

Deposited Material Accession Numtw Deposit Date 

PEP015 NRRL B-30033 June 11, 1998 

PEP032 NRRLB-30119 April 16. 1999 

DETAILED DESCRIPTION OF THE INVENTION 

"me genes involved in the biosynthesis of epothilones can be isolated using the 
techniques according to the present invention. The preferable procedure for the isolation of 
epothilone biosynthesis genes requires the isolation of genomic DNA from an organism 
identified as producing epothilones A and B. and the transfer of the isolated DNA on a 
suitable plasmid or vector to a host organism that does not normally produce the polyketide, 
followed by the identification of transformed host colonies to which the epothilone-producing 
ability has been conferred. Using a technique such as X::Tn5 transposon mutagenesis (de 
Bruijn & Lupski, Gene 27: 131-149 (1984)). the exact region of me transforming epothilone- 
conferring DNA can be more precisely defined. Alternatively or additionally, the transfor- 
ming epothilone-conferring DNA can be cleaved into smaller fragments and the smallest 
that maintains the epothilone-conferring ability further characterized. Whereas the host 
organism lacking the ability to produce epothilone may be a different species from the orga- 
nism from which the polyketide derives, a variation of this technique involves the transfor- 
mation of host DNA into the same host that has had its epothilone-producing ability disrup- 
ted by mutagenesis. In this method, an epothilone-producing organism is mutated and non- 
epothilone-producing mutants are isolated. These are then complemented by genomic 
DNA isolated from the epothilone-producing parent strain. 
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A further example of a technique that can be used to isolate genes required for epo- 
thilone biosynthesis is the use of trahsposon mutagenesis to generate mutants of an epothi- 
lone-producing organism that, after mutagenesis, fails to produce the polyketide. Thus, the 
region of the host genome responsible for epothilone production is tagged by the transpo- 
son and can be recovered and used as a probe to isolate the native genes from the parent 
strain. PKS genes that are required for the synthesis of polyketides and that are similar to 
known PKS genes may be isolated by virtue of their sequence homology to the biosynthetic 
genes for which the sequence is known, such as those for the biosynthesis of rifamycin or 
soraphen. Techniques suitable for isolation by homology include standard library screening 
by DNA hybridization. 

Preferred for use as a probe molecule is a DNA fragment that is obtainable from a 
gene or another DNA sequence that plays a part in the synthesis of a known polyketide. A 
preferred probe molecule comprises a 1.2 kb Smal DNA fragment encoding the ketosyntha- 
se domain of the fourth module of the soraphen PKS (U.S. Patent No. 5,716,849), and a 
more preferred probe molecule comprises the p-ketoacyl synthase domains from the first 
and second modules of the rifamycin PKS (Schupp et a/., FEMS Microbiology Letters 159: 
201-207 (1998)). These can be used to probe a gene library of an epothilone-producing 
microorganism to isolate the PKS genes responsible for epothilone biosynthesis. 

Despite the well-known difficulties with PKS gene isolation in general and despite 
the difficulties expected to be encountered with the isolation of epothilone biosynthesis 
genes in particular, by using the methods described in the instant specification, biosynthetic 
genes for epothilones A and B can surprisingly be cloned from a microorganism that produ- 
ces that polyketide. Using the methods of gene manipulation and recombinant production 
described in this specification, the cloned PKS genes can be modified and expressed in 
transgenic host organisms. 

The isolated epothilone biosynthetic genes can be expressed in heterologous hosts 
to enable the production of the polyketide with greater efficiency than might be possible 
from native hosts. Techniques for these genetic manipulations are specific for the different 
available hosts and are known in the art. For example, heterologous genes can be expres- 
sed in Streptomyces and other actinomycoses using techniques such as those described in 
McDaniel et a/. f Science 262: 1 546-1 550 (1 993) and Kao era/., Science 265: 509-51 2 
(1994), both of which are incorporated herein by reference. See also, Rowe et a/., Gene 
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216: 215-223 (1998); Holmes et al., EMBO Journal 12(8): 3183-3191 (1993) and Bibb et al., 
Gene 38: 215-226 (1985), all of which are incorporated herein by reference. 

Alternated, 9enes responsible for polyketide biosynthesis, i.e., epothilone biosynthe- 
tic genes, can also be expressed in other host organisms such as pseudomonads and £ 
coli. Techniques for these genetic manipulations are specific for the different available hosts 
and are known in the art. For example, PKS genes have been sucessfully expressed in £ 
coli using the pT7-7 vector, which uses the T7 promoter. See. Tabor et at., Proc. Natl. 
Acad. Sd. USA 82: 1074-1 078 (1 985), incorporated herein by reference. In addition, the 
expression vectors pKK223-3 and pKK223-2 can be used to express heterologous genes in 
£ coli, either in transcriptional or translational fusion, behind the tec or trc promoter. For 
the expression of operons encoding multiple ORFs, the simplest procedure is to insert the 
operon into a vector such as pKK223-3 in transcriptional fusion, allowing the cognate ribo- 
some binding site of the heterologous genes to be used. Techniques for overexpression in 
gram-positive species such as Bacillus are also known in the art and can be used in the 
context of this invention (Quax et a/., in: Industrial Microorganisms: Basic and Applied Mo- 
lecular Genetics, Eds. Baltz etal., American Society for Microbiology, Washington (1993)) 
Other expression systems that may be used with the epothilone biosynthetic genes 
of the invention include yeast and baculovirus expression systems. See. for example. "The 
Expression of Recombinant Proteins in Yeasts," Sudbery, P. E.. Curr. Opin. Biotechnol, 
7(5): 517-524 (1996); "Methods for Expressing Recombinant Proteins in Yeast," Mackay, et 
al., Editor(s): Carey, Paul R., Protein Eng. Des. 105-153. Publisher. Academic. San Diego. 
Calif (1996); "Expression of heterologous gene products in yeast," Pichuantes, et al., 
Editoits): Cleland. J. L, Craik, C. S., Protein Eng. 129-1 61 , Publisher: Wiley-Liss, New 
York. N. Y (1996); WO 98/27203; Kealey etal.. Proc. Natl. Acad. Sci. USA 95: 505-509 
(1998); "Insect Cell Culture: Recent Advances. Bioengineering Challenges And Implications 
In Protein Production." Palomares, et al., EdHor(s): Galindo, Enrique; Ramirez, Octavio T, 
Adv. Bioprocess Eng. Vol. II. Invited Pap. Int. Symp., 2nd (1998) 25-52. Publisher: Kluwer. 
Dordrecht. Neth; "Baculovirus Expression Vectors," Jarvis, Donald L, Editorfs): Miller, Lois 
K., Baculoviruses 389-431 , Publisher. Plenum, New York, N. Y. (1 997); "Production Of He- 
terologous Proteins Using The Baculovirus/lnsect Expression System," Griffiths, et al., Afe- 
thods Mol. Biol. (Totowa, N. J.) 75 (Basic Cell Culture Protocols (2nd Edition)) 427-440 
(1 997); and "Insect Cell Expression Technology," Luckow, Verne A., Protein Eng. 1 83-21 8, 
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Publisher: Wiley-Uss f New York, N. Y. (1996); all of which are incorporated herein by refe- 
rence. 

Another consideration for expression of PKS genes in heterologous hosts is the re- 
quirement of enzymes for posttranslational modification of PKS enzymes by phosphopante- 
theinylation before they can synthesize polyketides. However, the enzymes responsible for 
this modification of type I PKS enzymes, phosphopantetheinyl (P-pant) transferaseis are not 
normally present in many hosts such as E. coli This problem can be solved by coexpres- 
sion of a P-pant transferase with the PKS genes in the heterologous host, as described by 
Kealey et a/., Proc. Natl. Acad. Sci. USA 95: 505-509 (1998), incorporated herein by re- 
ference. 

Therefore, for the purposes of polyketide production, the significant criteria in the 
choice of host organism are its ease of manipulation, rapidity of growth (i.e. fermentation), 
possession or the proper molecular machinery for processes such as posttranslational 
modification, and its lack of susceptibility to the polyketide being overproduced; Most 
preferred host organisms are actinomycetes such as strains of Streptomyces. Other pre- 
ferred host organisms are pseudomonads and E. coli. The above-described methods of 
polyketide production have significant advantages over the technology currently used in the 
preparation of the compounds. These advantages include the cheaper cost of production, 
the ability to produce greater quantities of the compounds, arid the ability to produce com- 
pounds of a preferred biological enantiomer, as opposed to racemic mixtures inevitably ge- 
nerated by organic synthesis. Compounds produced by heterologous hosts can be used in 
medical (e.g. cancer treatment in the case of epothilones) as well as agricultural applica- 
tions. 
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EXPERIMENTAL 

The invention will be further described by reference to the following detailed 
examples. These examples are provided for purposes of illustration only, and are not inten- 
ded to be limiting unless otherwise specified. Standard recombinant DNA and molecular 
cloning techniques used here are well known in the art and are described by Ausubel (ed.) t 
Current Protocols in Molecular Biology, John Wiley and Sons, Inc. (1 994); T. Maniatis, E. F. 
Fritsch and J. Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor 
laboratory, Cold Spring Harbor. NY (1989); and by T.J. Silhavy, M.L Berman. and LW. 
Enquist, Experiments with Gene Fusions, Cold Spring Harbor Laboratory. Cold Spring 
Harbor, NY (1984). 

Example 1 : Cultivation of an Epothilone-Producing Strain of Sorangium cellulosum 

Sorangium cellulosum strain 90 (DSM 6773. Deutsche Sammlung von Mikroorganis- 
men und Zellkulturen, Braunschweig) is streaked out and grown (30°C) on an agar plate of 
SolE medium (0.35% glucose, 0.05% tryptone, 0. 1 5% MgSO< x 7H 2 0. 0.05% ammonium 
sulfate, 0.1% CaCI 2 , 0.006% K 2 HP0 4 , 0.01% sodium dithionite, 0.0008% Fe-EDTA. 1.2% 
HEPES, 3.5% [vol/vol] supernatant of sterilized stationary S. cellulosum culture) pH ad. 7.4. 
Cells from about 1 square cm are picked and inoculated into 5 mis of G51 1 liquid medium 
(0.2% glucose. 0.5% starch. 0.2% tryptone, 0.1% probion S, 0.05% CaCI 2 x2H 2 0, 0.05% 
MgSO«x7H 2 0. 1 .2% HEPES, pH ad. 7.4) and incubated at 30°C with shaking at 225 rpm. 
After 4 days, the culture is transferred into 50 mis of G51t and incubated as above for 5 
days. This culture is used to inoculate 500 mis of G51t and incubated as above for 6 days. 
The culture is centrif uged for 10 minutes at 4000 rpm and the cell pellet is resuspended in 
50 mis of G51t. 

Example 2: Generation of a Bacterial Artificial Chromosome (Bac) Library 

To generate a Bac libraiy, S. cellulosum i cells cultivated as described in Example 1 
above are embedded into agarose blocks, lysed, and the liberated genomic DNA is partially 
digested by the restriction enzyme W/ndlll. The digested DNA is separated on an agarose 
gel by pulsed-field electrophoresis. Large (approximately 90-150 kb) DNA fragments are 
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isolated from the agarose gel and ligated into the vector pBelobacll. pBelobacll contains a 
gene encoding chloramphenicol resistance, a multiple cloning site in the lacZ gene provi- 
ding for blue/white selection on appropriate medium, as well as the genes required for the 
replication and maintenance of the plasmid at one or two copies per cell. The ligation mix- 
ture is used to transform Escherichia co//DH10B electrocompetent cells using standard 
electroporation techniques. Chloramphenicol-resistant recombinant (white, lacZ mutant) 
colonies are transferred toil positively charged nylon membrane filter in 384 3X3 grid for- 
mat. The clones are lysed and the DNA is cross-linked to the filters. The sarnie clones are 
also preserved as liquid cultures at -80°C. 

Example 3: Screening the Bac Library of Sorangium cellulosum 90 for the Presence of Type 
I Polyketide Synthase-Related Sequences 

The Bac library filters are probed by standard Southern hybridization procedures. 
The DNA probes used encode p-ketoacyl synthase domains from the first and second 
modules of the rifamycin polyketide synthase (Schupp et aL, FEMS Microbiology Letters 
159: 201-207 (1998)). The probe DNAs are generated by PCR with primers flanking each ' 
ketosynthase domain using the plasmid pNE95 as the template (pNE95 equals cosmid 2 
described in Schupp et aL (1 998)). 25 ng of PCR-amplified DNA is isolated from a 0.5% 
agarose gel and labeled with ^P-dCTP using a random primer labeling kit (Gibco-BRL, 
Bethesda MD, USA) according to the manufacturer's instructions. Hybridization is at 65°C 
for 36 hours and membranes are washed at high.stringency (3, times with 0.1 x SSC and 
0.5% SDS for 20 min at 65°C). The labeled blot is exposed on a phosphorescent screen 
and the signals are detected on a Phospholmager 445SI (screen and 445SI from Molecular 
Dynamics). This results in strong hybridization of certain Bac clones to the probes. These 
clones are selected and cultured overnight in 5 mis of Luria broth (LB) at 37°C. Bac DNA 
from the Bac clones of interest is isolated by a typical miniprep procedure. The cells are 
resuspended in 200 fil lysozyme solution (50mM glucose, 10 mM EDTA, 25 mM Tris-HCI, 
5mg/ml lysozyme), lysed in 400 nl lysis solution (0.2 N NaOH and 2% SDS), the proteins 
are precipitated (3.0 M potassium acetate, adjusted to pH5.2 with acetic acid), and the Bac 
DNA is precipitated with isopropanol. The DNA is resuspended in 20jil of nuclease-f ree 
distilled water, restricted with BamH! (New England Biolabs, Inc.) and separated on a 0.7% 
agarose gel. The gel is blotted by Southern hybridization as described above and probed 
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■ under conditions described above, with a 1 .2 kb Smal DNA fragment encoding the ketosyn- 
thase domain of the fourth module of the soraphen polyketide synthase as the probe (see, 
U.S. Patent No. 5,716,849). Rve different hybridization patterns are observed. One clone 
representing each of the five patterns is selected and named pEPOIS, pEPO20, pEPO30 f 
pEP031, and pEP033, respectively. 

Example 4: Subcloning of BamHl Fragments from pEPOIS, pEPO20, pEPO30, pEP031, 

and pEP033 

The DNA of the five selected Bac clones is digested with BamHl and random frag- 
ments are subcloned into pBluescript II SK+ (Stratagene) at the BamHl site. Subclones car- 
rying inserts between 2 and 10 kb in size are selected for sequencing of the flanking ends 
of the inserts and also probed with the 1.2 Smal probe as described above. Subclones that 
show a high degree of sequence homology to known polyketide synthases and/or strong 
hybridization to the soraphen ketosynthase domain are used for gene disruption experi- 
ments. 

Example 5: Preparation of Streptomycin-Resistant Spontaneous Mutants of Sorang/um 

celluiosum strain So ce90 

0.1 ml of a three day old culture of Sorangium celluiosum strain So ce90, which is 
raised in liquid medium G52-H (0.2% yeast extract, 0.2% soyameal defatted, 0.8% potato 
starch, 0.2% glucose, 0.1% MgS04 X7H20, 0.1% CaCI2 X2H20, 0.008% Fe-EDTA, pH ad 
7.4 with KOH), is plated out on agar plates with SolE medium supplemented with 100 *ig/ml 
streptomycin. The plates are incubated at 30°C for 2 weeks. The colonies growing on this 
medium are streptomycin-resistant mutants, which are streaked out and cultivated once 
more on the same agar medium yvith streptomycin for purification. One of these strepto- 
mycin-resistant mutants is selected and is called BCE28/2. 
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Example 6: Gene Disruptions in Sorangium cellulosum BCE28/2 Using the Subcloned 

BamH\ Fragments 

The BamH\ inserts of the subclones generated from the five selected Bac clones as 
described above are isolated and ligated into the unique BamH\ site of plasmid pCIB1 32 
[see, U.S. Patent No. 5,716,849). The pCIB132 derivatives carrying the inserts are trans- 
formed into Escherichia coli ED8767 containing the helper plasmid pU28 (Hedges and 
Matthew, Plasmid 2: 269-278 (1979). the transformants are used as donors in conjugation 
experiments with Sorangium cellulosum BCE28/2 as recipient. For the conjugation, 5-10 x 
10 9 cells of Sorangium cellulosum BCE28/2.frpm an early stationary phase culture (reaching 
about 5 x 10 8 cells/ml) grown at 30°C in liquid medium G51b (G51b equals medium G51t 
with tryptone replaced by peptone) are mixed in a 1:1 cellular ratio with a late-log phase 
culture (in LB liquid medium) of £ coli ED8767 containing pCIBi 32 derivatives carrying the 
subcloned SamHI fragments and the helper plasmid pUZ8. The mixed cells are then centri- 
fuged at 4000 rpm for 10 minutes and resuspended in 0.5 ml G51b medium. This cell sus- 
pension is then plated as a drop in the center of a plate with So1E agar containg 50 mg/l 
kanamycin. The cells obtained after incubation for 24 hours at 30°C are harvested and res- 
uspended in 0.8 ml of G51b medium, and 0.1 to 0.3 ml of this suspension is plated out on a 
selective So1E solid medium containing phieomycin (30 mg/l), streptomycin (300 mg/l), and 
kanamycin (50 mg/l). The counterselection of the donor Escherichia coli strain takes place 
with the aid of streptomycin. The colonies that grow on this selective medium after an in- . 
cubation time of 8-12 days at a temperature of 30°C are isolated with a plastic loop and 
streaked put and cultivated on the same agar medium for a second round of selection and 
purification. The colony-derived cultures that grow on this selective agar medium after 7 
days at a temperature of 30°C are transconjugants of Sorangium cellulosum BCE28/2 that 
have acquired phieomycin resistance by conjugative transfer of the pCIB1 32 derivatives 
carrying the subcloned BamHI fragments. 

Integration of the pC!B1 32-derived plasmids into the chromosome of Sorangium 
cellulosum BCE28/2 by homologous recombination is verified by Southern hybridization. 
For this experiment, complete DNA from 5-10 tranconjugants per transferred BarrMl frag- 
ment is isolated (from 10 ml cultures grown in medium G52-H for three days) applying the 
method described by Pospiech and Neumann, Trends Genet 11: 217 (1995). For the 
Southern blot, the DNA isolated as described above is cleaved either with the restriction 
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enzymes BgH\ t C/al, or Afo/I, and the respective Ba/riHI inserts or pCIB132 are used as 32P 
labelled probes. 

Example 7: Analysis of the Effect of the Integrated BamHl Fragments on Epothilone 
Production by Sorangium cellulosum After Gene Disruption 

Transconjugant cells grown on about 1 square cm surface of the selective So1E 
plates of the second round of selection (see Example 6) are transferred by a sterile plastic 
loop into 10 ml of medium G52-H in an 50 ml Erlenmeyer flask. After incubation at 30°C 
and 180 rpm for 3 days, the culture is transfered into 50 ml of medium G52-H in an 200 ml 
Erlenmeyer flask. After incubation at 30°C and 180 rpm for 4-5 days, 10 ml of this culture is 
transfered into 50 ml of medium 23B3 (0.2 % glucose, 2 % potato starch, 1 .6 % soya meal 
defatted, 0.0008 % Fe-EDTA Sodium salt, 0.5 % HEPES (4-(2-hydroxyethyl)-piperazine-1- 
ethane-sulfonic-acid), 2 % vol/vol polysterole resin XAD16 (Rohm & Haas), pH adjusted to 
7.8 with NaOH) in an 200 ml Erlenmeyer flask. 

Quantitative determination of the epothilone produced takes place after incubation 
of the cultures at 30°C and 180 rpm for 7 days. The complete culture broth is filtered by ' 
suction through a 150 \im nylon filter. The resin remaining on the filter is then resuspended 
in 10 ml isopropanol and extracted by shaking the suspension at 180 rpm for 1 hour. 1 ml 
is removed from this suspension and centrifuged at 12,000 rpm in an Eppendorff Microfuge. 
The amount of epothilones A and B therein is determined by means of an HPLC and 
detection at 250 nm with a UVJDAD detector (HPLC with Waters -Symetry C1 8 column and 
a gradient of 0.02 % phosphoric acid 60%-0% and acetonitril 40%-100%). 

Transconjugants with three different integrated BamHl fragments subcloned from 
pEPOIS, namely transconjugants with the BamHl fragment of plasmid pEPOIS-21, trans- 
conjugants with the SamHI fragment of plasmid pEPOl 5-4-5, and transconjugants with the 
BamHl fragment of plasmid pEPOl 5-4-1 f are tested in the manner described above. HPLC 
analysis reveals that all transconjugants no longer produce epothilone A or B. By contrast, 
epothilone A and B are detectable in a concentration of 2-4 mg/l in transconjugants with 
BamHl fragments integrated that are derived from pEPO20, pEPO30, pEP031, pEP033, 
and in the parental strain BCE28/2. 
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Example 8: Nucleotide Sequence Determination of th6 Cloned Fragments and 

Construction of Coritigs 

A. BamHI Insert of Piasmid pEP015-21 

Plasmid DNA is isolated from the strain Escherichia co//DH10B [pEPOl 5-21], and 
the nucleotide sequence of the 2.3-kb BamHI insert in pEP015-21 is determined. Automa- 
ted QNA sequencing is done oh the double-stranded DNA template by the dideoxynucleb- 
tide chain termination method; using Applied Biosystems model 377 sequencers. The pri- 
mers used are the universal reverse primer (5* GGA AAC AGC TAT GAC CAT G 3' (SEQ ID 
NO:24)) and the universal forward primer (5' GTA AAA CGA CGG CCAGT 3' (SEQ ID 
NO:25)). In subsequent rounds of sequencing reactions, custom-synthesized oligonucle- 
otides, designed for the 3' ends of the previously determined sequences, are used to 
extend and join contigs. Both strands are entirely sequenced, and every nucleotide is se- 
quenced at least two times. The nucleotide sequence is compiled using the program 
Seqqencher vers. 3.0 (Gene Codes Corporation), and analyzed using the University of 
Wisconsin Genetics Computer Group programs.: The nucleotide sequence of the 2213-bp 
insert corresponds to nucleotides 20779-22991 of SEQ ID NO:1 . 

B. BamHI Insert of Plasmid pEPOl 5-4-1 

Plasmid DNA is isolated from the strain Escherichia coii DH10B [pEPOt 5-4-1], and 
the nucleotide sequence of the 3.9-kb BamHI insert in pEPOl 5-4-1 is determined as descri- 
bed in (A) above. The nucleotide sequence of the 3909-bp insert corresponds to nucleo- 
tides 16876-20784 of SEQ ID NO:1. 

C. BamHI Insert of Plasmid pEPOl 5-4-5 

Plasmid DNA is isolated from the strain Escherichia coii DH10B [pEPOl 5-4-5], and 
the nucleotide sequence of the 2.3-kb BamHI insert in pEP0 15-4-5 is deterrhined as 
described in (A) above. The nucleotide sequence of the 2233-bp insert corresponds to 
nucleotides 42528-44760 of SEQ ID NO:1. 
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Example 9: Subcloning and Ordering of DNA Fragments from pEPOIS Containing 
Epothilone Biosynthesis Genes 

pEPOl 5 is digested to completion with the restriction enzyme H/ndlll and the resul- 
ting fragments are subcloned into pBluescript II SK- or pNEB193 (New England Biolabs) 
that has been cut with W/ndlll and dephosphorylated with calf intestinal alkaline phospha- 
tase. Six different clones are generated and named pEPQis-NHI, pEP015-NH2, 
pEP015-NH6, pEP015-NH24 (all based on pNEB193), and pEP015-H2.7 and pEPOtS- 
H3.0 (both based on pBluescript II SK-). 

The SamHI insert of pEPOl 5-21 is isolated and DIG-labeled (Non-radioactive DNA 
labeling and detection system, Boehringer Mannheim), and used as a probe in DNA hybri- 
dization experiments at high stringency against pEP015-NH1, pEP015-NH2, pEPOIS- 
NH6, pEPOI5-NH24, pEP015-H2.7 and pEPO15-H3.0. Strong hybridization signal is de- 
tected for PEP015-NH24, indicating that pEP015-21 is contained within pEP015-NH24. 

The BamHl insert of pEPOl 5-4-1 is isolated and DIG-labeled as above, and used as 
a probe in DNA hybridization experiments at high stringency against pEPOl 5-NH1 , 
PEP015-NH2, PEP015-NH6, pEP015-NH24, pEP015-H2.7 and pEPO15-H3.0. Strong 
hybridization signals are detected for pEP015-NH24 and pEP015-H2.7. Nucleotide se- 
quence data generated from one end each of pEP015-NH24 and pEP015-H2.7 are also in 
complete agreement with the previously determined sequence of the SamHI insert of 
pEPOl 5-4-1 . These experiments demonstrate that pEPOl 5-4-1* (which contains one inter- 
nal H/ndlll site) overlaps pEP015-H2.7 and pEP015-NH24, and that pEP015-H2.7 and 
pEP015-NH24, in this order, are contiguous. 

The BamHl insert of pEP015-4-5 is isolated and DIG-labeled as above, and used as 
a probe in DNA hybridization experiments at high stringency against pEPOl 5-NH1 , 
PEP015-NH2, pEPOl 5-NH6, pEPOl 5-NH24, pEPOl 5-H2.7 and pEPOl 5-H3.0. Strong 
hybridization signal is detected for pEP015-NH2, indicating that pEP015-21 is contained 
within pEP015-NH2. 

Nucleotide sequence data is generated from both ends of pEP015-NH2 and from 
the end of pEP015-NH24 that does not overlap with pEPOl 5-4-1. PCR primers NH24 end 
"B": GTGACTGGCGCCTGGAATCTGCATGAGC (SEQ ID NO:26), NH2 end M A": 
AGCGGGAGCTTGCTAGACATTCTGTTTC (SEQ ID NO:27), and NH2 end "B 1 *: 
GACGCGCCTCGGGCAGCGCCCCAA (SEQ ID NO:28) f pointing towards the H/hdlll sites, 
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are designed based on these sequences and used in amplification reactions with pEP015 
and, in separate experiments* with Sorangium cellulosum So ce90 genomic DNA as the 
templates. Specific amplification is found with primer pair NH24 end U B W and NH2 end "A" 
with both templates. The amplimers are cloned into pBiuescript II SK- and completely se- 
: quenced. The sequences of the amplimers are identical, and also agree completely with 
the end sequences of pEP015-NH24 and pEP015-NH2, fused at the H/ndlll site, estab- 
lishing that the H/ndlll fragments of pEP015-NH2 and pEP015-NH24 are, in this order, 
contiguous. 

The H/ndlll insert of pEP015-H2.7 is isolated and DIG-labeled as above, and used as 
a probe in a DNA hybridization experiment at high stringency against pEPOIS digested by 
A/ofl. A A/ofl fragment of about 9 kb in size shows a strong a hybridization, and is further 
subcloned into pBiuescript II SK- that has been digested with A/ofl and dephosphorylated 
with calf intestinal alkaline phosphatase, to yield pEP015-N9-16. The A/ofl insert of 
pEP015-N9-16 is isolated and DIG-labeled as above, and used as a probe in DNA hybri- 
dization experiments at high stringency against pEP015-NH1, pEP015-NH2, pEP015- 
NH6, pEP015-NH24, pEP015-H2.7 and pEPO15-H3.0. Strong hybridization signals are 
detected for pEPOT5-NH6, and also for the expected clones pEP015-H2.7 and pEPOIS- 
NH24. Nucleotide sequence data is generated from both ends of pEP015-NH6 and from 
the end of pEP015-H2.7 that does not overlap with pEPOl 5-4-1. PCR primers are de- 
signed pointing towards the H/ndlll sites and used in amplification reactions with pEP015 
and, in separate experiments, with Sorangium cellulosum So ce90 genomic DNA as the 
templates. Specific amplification is found with primer pair pEP015-NH6 end M B": 
CACCGAAGCGTCGATCTGGTCCATC (SEQ IDNO:29) and pEP015-H2.7 end "A*: 
CGGTCAGATCGACGACGGGCTTTCC (SEQ ID NO:30) with both templates. The ampli- 
mers are cloned into pBiuescript II SK- and completely sequenced. The sequences of the 
amplimers are identical, and also agree completely with the end sequences of pEPOIS- 
NH6 and pEP015-H2.7, fused at the H/ndlll site, establishing that the H/ndlll fragments of 
pEP015-NH6 and pEP015-H2.7 are, in this order, contiguous. 

All of these experiments, taken together, establish a contig of H/ndlll fragments 
covering a region of about 55 kb and consisting of the H/ndlll inserts of pEP015-NH6, 
pEP015-H2.7, pEPOl 5-NH24, iand pEPOl 5-NH2, jn this order. The inserts of the re- 
maining two H/ndlll subclones, namely pEP015-NH1 and pEPO15-H3.0, are not found to 
be parts of this contig. 
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Example 1 0: Further Extension of the Subclone Contig Covering the Epothilone 

Biosynthesis Genes 

An approximately 2.2 kb BamHI - HindM fragment derived from the downstream end 
of the insert of PEP015-NH2 and thus representing the downstream end of the subclone 
contig described in Example 9 is isolated, DIG-labeled, and used in Southern hybridization 
experiments against .pEPOl 5 and pEP015-NH2 DNAs digested with several enzymes. The 
strongly hybridizing bands are always found to be the same in size between the two target 
DNAs indicating that the Sorangium cellulosumSo ce90 genomic DNA fragment cloned into 
PEP015 ends with the H/ndlll site at the downstream end of pEP015-NH2. 

A cosmid DNA library of Sorangium cellulosum So ce90 is generated, using establi- 
shed procedures, in pScosTriplex-ll (Ji, et aL, Genomics 31 : 1 85-1 92 (1996)). Briefly, high- 
molecular weight genomic DNA of Sorangium celluhsum So ce90 is partially digested with 
the restriction enzyme Sau3AI to provide fragments with average sizes of about 40 kb, and 
ligated to SamHI and Xba\ digested pScosTriplex-ll. The ligation mix is packaged with 
Gigapack III XL (Stratagene) and used to transf ect £ coli XL1 Blue MR cells, 

The cosmid library is screened with the approximately 2.2 kb BamHI - H/ndlll frag- 
ment, derived from the downstream end of the insert of pEPOl5-NH2, used as a probe in 
colony hybridization. A strongly hybridizing clone, named pEP04E7 is selected. 

PEP04E7 DNA is isolated, digested with several restriction endonucleases, and 
probed in Southern hybridization experiments with the 2.2 kb BamHI — H/ndlll fragment A 
strongly hybridizing Afofl fragment of approximately 9 kb in size is selected and subcloned 
into pBluescript II SK- to yield pEP04E7-N9-8. Further Southern hybridization experiments 
reveal that the approximately 9 kb Atoll insert of pEP04E7-N9-8 overlaps pEPOl5-NH2 
over 6 kb in a NoR - H/ndlll fragment, while the remaining approximately 3 kb H/ndlll- Ato/I 
fragment would extend the subclone contig described in Example 9. End sequencing re- 
veals, however, that the downstream end of the insert of pEP04E7-N9-8 contains the 
BamHI - Nott polylinker of pScosTriplex-ll, thereby indicating that the genomic DNA insert 
of pEP04E7 ends at a Sau3Al site within the extending H/ndlll - Not fragment and that the 
Nod site is derived from pScosTriplex-ll. 

An approximately 1.6 kb PsA - Sa/I fragment derived from the approximately 3 kb 
extending H/ndlll - Noti subfragment of pEP04E7-N9-8, containing only Sorangium 
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cellulosum So ce90-derived sequences free of vector, is used as a probe against the 
bacteria! artificial chromosome library described in Example 2. Besides the previously- 
isolated EP015, a Bac clone, named EP032, is found to strongly hybridize to the probe. 
pEP032 is isolated, digested with several restriction endonudeases, and hybridized with 
the approximately 1 .6 kb Psfl - Sali probe. A H/ndlll - EcoRV fragment of about 13 kb in 
size is found- to strongly. hybridize to the probe, and is subcloned into pBluescript II SK- 
digested with H/ndlll and H/ncll to yield pEP032-HEV1 5. 

Oligonucleotide primers are designed based on the downstream end sequence of . 
pEP015-NH2 and on the upstream (H/ndlll) end sequence derived from pEP032-HEV15, 
and used in sequencing reactions with pEPQ4E7-N9-8 as the template. The sequences 
reveal the existence of a small H/ndllj fragment (EPO4E7-H0.02) of 24 bp, undetectable in 
standard restriction analysis, separating the H/ndlll site at the downstream end of pEP015- 
NH2 from the H/ndlll site at the upstream end of pEP032-HEV15. 

Thus, the subclone contig described in Example 9 is extended to include the H/ndlll 
fragment EPO4E7-H0.02 and the insert of pEPQ32-HEV15, and constitutes the inserts of: 
pEPQ15-NH6, pEP015-H2.7, pEP015-NH24, pEP015-NH2, EPO4E7-H0.02 and pEP032- 
HEV1 5, in this order. 

Example 11: Nucleotide Sequence Determination of the Subclone Contig Covering the 

Epothilone Biosynthesis Genes 

The nucleotide sequence of the subclone contig described in Example 10 is 
determined as follows. 

PEP015-H2.7. Plasmid DNA is isolated from the strain Escherichia co//DH10B 
[pEP015-H2.7], and the nucleotide sequence of the 2.7-kb flamHI insert in pEP015-H2.7 is 
determined. Automated DNA sequencing is done on the double-stranded DNA template by 
the dideoxynucleotide chain termination method, using Applied Biosystems model 377 
sequencers. The primers used are the universal reverse primer (5' GGA AAC AGC TAT 
GAC CAT G 3* (SEQ ID NO:24)) and the universal forward primer (5* GTA AAA CGA CGG 
CCA GT 3* (SEQ ID NO:25)). In subsequent rounds of sequencing reactions, custom- 
synthesized oligonucleotides, designed for the 3' ends of the previously determined 
sequences, are used to extend and join contigs. 
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PEP015-NH6. pEP015-NH24 and P EP015-NH2, The W/hdlll inserts of these plas- 
mids are isolated, and subjected to random fragmentation using a Hydroshear apparatus 
(Genomic Instrumentation Services, Inc.) to yield an average fragment size of 1-2 kb. The 
fragments are end-repaired using T4DNA Polymerase and Klenow DNA Polymerase en- 
zymes in the presence of desoxynudeotide triphosphates, and phosphorylated with T4 DNA 
Kinase in the presence of ribo-ATP. Fragments in the size range of 1 .5-2.2 kb are isolated 
from agarose gels, and ligated into pBluescript II SK- that has been cut with EcoRV and de- 
phosphorylated. Random subclones are sequenced using the universal reverse and the 
universal forward primers. 

PEP032-HEV15. PEP032-HEV15 is digested with HindlH and Sspl, the approxima- 
tely 13.3 kb fragment containing the -13 kb H/ndlll - EcoRV insert from So. cellulosum So 
ce90 and a 0.3 kb H/ncll - Sspl fragment from pBluescript II SK- is isolated, and partially 
digested with HaelH to yield fragments with an average size of 1-2 kb. Fragments in the size 
range of 1 .5-2.2 kb are isolated from agarose gels, and ligated into pBluescript II SK- that 
has.been cut with EcoRV and dephosphorylated. Random subclones are sequenced using 
the universal reverse and the universal forward primers. 

The chromatograms are analyzed and assembled into contigs with the Phred. Phrap' 
and Consed programs (Ewing, etal. Genome Res. 8(3): 175-185 (1998); Ewing. ef a/.. 
Genome Res. 8(3): 186-194 (1998); Gordon, etal., Genome Res. 8(3): 195-202 (1998)). 
Contig gaps are filled, sequence discrepancies are resolved, and low-quality regions are 
resequenced using custom-designed oligonucleotide primers for sequencing on either the 
original subclones or selected clones from the random subclone libraries. Both strands are 
completely sequenced, and every basepair is covered with at least a minimum aggregated 
Phred score of 40 (confidence level of 99.99%). 

The nucleotide sequence of the 68750 bp contig is shown as SEQ ID NO:1. 
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Example 12: Nucleotide Sequence Analysis of the Epothilone Biosynthesis Genes. 



• * 

SEQ ID NO:1 is found to contain 22 ORFs as detailed below in Table 1; 

Table 1 



ORF 


Start cod on 


Stop cod on 


Homology of deduced protein 


Proposed function of deduced protein 


nrf] 

UfJ l 


outside of 
sequenced 
ranee 


1826 






orfl* 


3171 


1900 


Hypothetical protein SP: Ql 1037; 
DD-peptidase SP:P15555 




orfh 


3415 


5556 


Na/H antiporter PID: D 10 17724 


Transport 


orf* * 


5992 


5612 






orfS 


6226 


6675 






epoA 


7610 


11875 


Type I polyketidc synthase 


Epothilone synthase: Thiazole ring 
formation 


epo? 


11872 


16104 


Non-ribosomal peptide synthetase 


Epothilone synthase: Thiazole ring 
formation 


epoB 


16251 


21749 


Type I polyketide synthase 


Epothilone synthase: Polyketide 
backbone formation 


epoC 


21746 


43519 


Type I polyketide synthase 


Epothilone synthase: Polyketide 
backbone formation 


epoD 


43524 


54920 


Type I polyketide synthase 


Epothilone synthase: Polyketide 
backbone formation 


epoE 


54935 


62254 


Type I polyketide synthase 


Epothilone synthase: Polyketide 
backbone formation 


epof 


62369 


63628 


Cytochrome P450 


Epothilone macrolacione oxidase 


orfb 


63779 


64333 * 






orfl* 


64290 


63853 






orfS 


64363 


64920 






or/9* 


64727 


64287 






orfiO 


65063 


65767 






orfU * 


65874 


65008 






or/12* 


66338 


65871 






or/13 


66667 


67137 






or/14 


67334 


68251 


Hypothetical protein GI:3293544; 
Cation efflux system protein 
GL2623026 


Transport 


orf\5 


68346 


outside of 
sequenced 
ranpe 







* On the reverse complementer strand. Numbering according to SEQ ID NO:1. 



epoA (nucleotides 7610-11875 of SEQ ID NO:1) codes for EPOS A (SEQ ID NO:2), a 
type I polyketide synthase consisting of a single module, and harboring the following do- 
mains: p-ketoacyi-synthase (KS) (nucleotides 7643-8920 of SEQ ID NO:1, amino acids 11- 
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437 of SEQ ID NO:2); acyltransferase (AT) (nucleotides 9236-1 0201 of SEQ ID NO:1 . 
amino adds 543-864 of SEQ ID NO:2); enoyl reductase (ER) (nucleotides 10529-1 1428 of 
SEQ ID NO:1, amino acids 974-1273 of SEQ ID NO:2); and acyl carrier protein homologous 
domain (ACP) (nucleotides 11549-11764 of SEQ ID NO:1, amino acids 1314-1385 of SEQ 
ID NO:2). Sequence comparisons and motif analysis (Haydock, et al. FEBS Lett. 374: 246- 
248 (1 995); Tang, et al.. Gene 21 6: 255-265 (1 998)) reveal that the AT encoded by EPOS 
A is specific for malonyl-CoA. EPOS A should be involved in the initiation of epothilone bio- 
synthesis by loading the acetate unit to the multienzyme complex that will eventually form 
part of the 2-methylthiazole ring (C26 and C20). 

epoP (nucleotides 1 1872-16104 of SEQ ID NO:1) codes for EPOS P (SEQ ID NO:3). 
a non-ribosomal peptide synthetase containing one module. EPOS P harbors the following 
domains: 

peptide bond formation domain, as delineated by motif K (amino acids 72-81 
[FPLTDIQESY] of SEQ ID NO:3. corresponding to nucleotide positions 12085-12114 of 
SEQ ID NO:1 ); motif L (amino acids 1 1 8-1 25 [VVARHDML] of SEQ ID NO:3. correspon- 
ding to nucleotide positions 12223-12246 of SEQ ID NO:1); motif M (amino acids 199- 
212 [SIDLINVDLGSLSI] of SEQ ID NO:3. corresponding to nucleotide positions 12466- 
12507 of SEQ ID NO:1); and motif O (amino acids 353-363 [GDFTSMVLLDI] of SEQ ID 
NO:3, corresponding to nucleotide positions 1 2928-1 2960 of SEQ ID NO:1 ); 
• aminoacyl adenylate formation domain, as delineated by motif A (amino acids 549- 
565 [LTYEELSRRSRRLGARL] of SEQ ID NO:3. corresponding to nucleotide positions 
13516-13566 of SEQ ID NO:1); motif B (amino acids 588-603 [VAVLAVLESGAAYVPI] of 
SEQ ID NO:3. corresponding to nucleotide positions 13633-13680 of SEQ ID NO:1); 
motif C (amino acids 669-684 [AYVIYTSGSTGLPKGV] of SEQ ID NO:3. corresponding 
to nucleotide positions 13876-13923 of SEQ ID NO:1); motif D (amino acids 815-821 
[SLGGATE] of SEQ ID NO:3. corresponding to nucleotide positions 14313-14334 of 
SEQ ID NOM ); motif E (amino acids 868-892 [GQLYIGGVGLALG YWRDEEKTRKSF] of 
SEQ ID NO:3, corresponding to nucleotide positions 14473-14547 of SEQ ID NO:1); 
motif F (amino acids 903-912 [YKTGDLGRYL] of SEQ ID NO:3. corresponding to nucle- 
otide positions 14578-14607 of SEQ ID NO:1); motif G (amino acids 918-940 

(EFMGREDNQIKLRGYRVELGEIE] of SEQ ID NO:3. corresponding to nucleotide 
positions 14623-14692 of SEQ ID NO:1); motif H (amino acids 1268-1274 fLPEYMVPJ of 
SEQ ID NO:3. corresponding to nucleotide positions 15673-15693 of SEQ ID NO:1); and 
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motif I (amino acids 1285-1297 [LTSNGKVDRKALR] of SEQ ID NO:3, corresponding to 
nucleotide positions 15724^15762 of SEQ ID NO:1); 

• an unknown domain, inserted between motifs G and H of the aminoacyl adenylate 
formation domain (amino acids 973-1256 of SEQ ID NO:3, corresponding to nucleotide 
positions 14788-15639 of SEQ ID NO:1); and 

• a peptidyl carrier protein homologous domain (PCP), delineated by motif J (amino 
acids 1344-1351 [GATSIHIV] of SEQ ID NO:3, corresponding to nucleotide positions 
15901-15924 of SEQ ID NO:1). 

It is proposed that EPQS P is involved in the activation of a cysteine by adenylation, binding 
the activated cysteine as an aminoacyl-S-PCP, forming a peptide bond between the en- 
zyme-bound cysteine and the acetyl-S-ACP supplied by EPOS A, and the formation of the 
initial thiazoiine ring by intramolecular heterocyclization. The unknown domain of EPOS P 
displays very weak homologies to NAD(P)H oxidases and reductases from Bacillus species. 
Thus, this unknown domain and/or the ER domain of EPOS A may be involved in the oxida- 
tion of the initial 2-methyl^hiazoline ring to a 2-methylthiazole. 

epoB (nucleotides 16251-21749 of SEQ ID NO:1) codes for EPOS B (SEQ ID NO:4), 
a type I polyketide synthase consisting of a single module, and harboring the following do- 
mains: KS (nucleotides 1 6269-1 7546 of SEQ ID NO:1 , amino acids 7-432 of SEQ ID NO:4); 
AT (nucleotides 17865-18827 of SEQ ID NO:1 . amino acids 539-859 of SEQ ID NO:4); 
dehydratase (DH) (nucleotides 18855-19361 of SEQ ID NO:1 f amino acids 869-1037 of 
SEQ ID NO:4); (i-ketoreductase (KR) (nucleotides 20565-21302 of SEQ ID NO:1, amino . 
acids 1439-1684 of SEQ ID NO:4); and ACP (nucleotides 21414-21626 of SEQ ID NO:1, 
amino acids 1722-1792 of SEQ ID NO:4). Sequence comparisons and motif analysis reveal 
that the AT encoded by EPOS B is specific for methylmalonyl-CoA. EPOS A should be in- 
volved in the first polyketide chain extension by catalysing the Ciaisen-like condensation of 
the 2-methyl-4-thiazolecarboxyl-S-PCP starter group with the methylmalonyl-S-ACP, and 
the concomitant reduction of the b-keto group of C1 7 to an enpyl. 

epoC (nucleotides 21746-43519 of SEQ ID NO:1) codes for EPOS C (SEQ ID NO:5). 
a type I polyketide synthase consisting of 4 modules: The first module harbors a KS (nucle- 
otides 21860-23116 of SEQ ID NO:1, amino acids 39-457 of SEQ ID NO:5); a malonyl CoA- 
specific AT (nucleotides 23431 -24397 of SEQ ID NO:1 , amino acids 563-884 of SEQ lb 
NO:5); a KR (nucleotides 25184-25942 of SEQ ID NO:1, amino acids 1147-1399 of SEQ ID 
NO;5); and an ACP (nucleotides 26045-26263 of SEQ ID NO:1 , amino acids 1434-1506 of 
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SEQ ID NO:5). This module incorporates an acetate extender unit (C14-C13) and reduces 
the p-keto group at C15 to the hydroxy! group that takes part in the final lactonization of the 
epothilone macrolactone ring. The second module* of EPOS C harbors a KS (nucleotides 
26318-27595 of SEQ ID NO:1, amino acids 1524-1950 of SEQ ID NO:5); a malonyl CoA- 
specific AT (nucleotides 27911-28876 of SEQ ID NO:1, amino acids 2056-2377 of SEQ ID 
NO:5); a KR (nucleotides 29678-30429 of SEQ ID NO:1, amino acids 2645-2895 of SEQ ID 
NO:5); and an ACP (nucleotides 30539-30759 of SEQ ID NO:l, amino acids 2932-3005 of 
SEQ ID NO:5). This module incorporates an acetate extender unit (C12-C11) and reduces 
the p-keto group at C13 to a hydroxyl group. Thus, the nascent polyketide chain of epothi- 
lone corresponds to epothilone A. and the incorporation of the methyl side chain at C1 2 in 
epothilone B would require a post-PKS C-methyltransferase activity. The formation of the 
epoxi ring at C13-C12 would also require a post-PKS oxidation step. The third module of 
EPOS C harbors a KS (nucleotides 30815-32092 of SEQ ID NO:1, amino acids 3024-3449 
of SEQ ID NO:5); a malonyl CoA-specific AT (nucleotides 32408-33373 of SEQ ID NO:1. 
amino acids 3555-3876 of SEQ ID NO:5); a DH (nucleotides 33401-33889 of SEQ ID NO:1, 
amino acids 3886-4048 of SEQ ID NO:5); an ER (nucleotides 35042-35902 of SEQ ID 
NO:1. amino acids 4433-471 9 of SEQ ID NO:5); a KR (nucleotides 35930-36667 of SEQ ID 
NOM, amino acids 4729-4974 of SEQ ID NO:5); and an ACP (nucleotides 36773-36991 of 
SEQ ID NO:1. amino acids 5010-5082 of SEQ ID NO:5). This module incorporates an ace- 
tate extender unit (C10-C9) and fully reduces the p-keto group at C1 1. The fourth module 
of EPOS C harbors a KS (nucleotides 37052-38320 of SEQ ID NO:1, amino acids 5103- 
5525 of SEQ ID NO:5); a methylmalonyl CoA-specific AT (nucleotides 38636-39598 of SEQ 
ID NOM, amino acids 5631-5951 of SEQ ID NO:5); a OH (nucleotides 39635-40141 of SEQ 
ID NO:1, amino acids 5964-6132 of SEQ ID NO:5); an ER (nucleotides 41369-42256 of 
SEQ ID NO:1 , amino acids 6542-6837 of SEQ ID NO:5); a KR (nucleotides 42314-13048 of 
SEQ ID NO:1, amino adds 6857-7101 of SEQ ID NO:5): and an ACP (nucleotides 43163- 
43378 of SEQ ID NO:1. amino acids 7140-7211 of SEQ ID NO:5). This module incorporates 
a propionate extender unit (C24 and C8-C7) and fully reduces the p-keto group at C9. 

epoD (nucleotides 43524-54920 of SEQ ID NO:1) codes for EPOS D (SEQ ID NO:6), 
a type I polyketide synthase consisting of 2 modules. The first module harbors a KS 
(nucleotides 43626-44885 of SEQ ID NQ:1, amino acids 35-454 of SEQ ID NO:6); a 

methylmalonyl CoA-specific AT (nucleotides 45204-46166 of SEQ ID NO:1, amino acids 
561 -881 of SEQ ID NO:6); a KR (nucleotides 46950-47702 of SEQ ID NO:1 , amino acids 
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1 143-1393 of SEQ ID NO:6); and an ACP (nucleotides 4781 1-48032 of SEQ ID NO:1, ami- 
no acids 1430-1503 of SEQ ID NO:6). This module incorporates a propionate extender unit 
(C23 and C6-G5) and reduces the p-keto group at C7 to a hydoxyl group. The second mo- 
dule harbors a KS (nucleotides 48087-49361 of SEQ ID NO:1 f amino acids 1522-1946 of 
SEQ ID NO: 6); a methylmalonyl CoA-specific AT (nucleotides 49680-50642 of SEQ ID 
NO:1, amino acids 2053-2373 of SEQ ID NO:6); a DH (nucleotides 50670-51 176 of SEQ ID 
; NO:1 , amino acids 2383-2551 of SEQ ID NO:6); a methyltransf erase (MT, nucleotides 
51534-52657 of SEQ ID NO; 1 , amino acids 2671-3045 of SEQ ID NO:6); a KR (nucleotides 
53697-54431 of SEQ ID NO:1, amino acids 3392-3636 of SEQ ID NO:6); and an ACP 
(nucleotides 54540-54758 of SEQ ID NO:1, amino acids 3673-3745 of SEQ ID NO:6). This 
module incorporates a propionate extender unit (C21 or C22 and C4-C3) and reduces the 
p-keto group at C5 to a hydoxyl group. This reduction is somewhat unexpected, since epo- 
thilones contain a keto group at C5. Discrepancies of this kind between the deduced reduc- 
tive capabilities of PKS modules and the redox state of the corresponding positions in the 
final polyketide products have been, however, reported in the literature (see, for example, 
Schwecke, et ai., Proc. Natl. Acad. Set. USA 92: 7839-7843 (1 995) and Schupp, et a!., 
FEMS Microbiology Letters 1 59: 201-207 (1 998)). An important feature of epothilones is ' 
the presence of gem-methyl side groups at C4 (C21 and C22). The second module of 
EPOS D is predicted to incorporate a propionate unit into the growing polyketide chain, 
providing one methyl side chain at C4. This module also contains a methyltransf erase do- 
main integrated into the PKS between the DH and the KR domains, in an arrangement simi- 
lar to the one seen in the HMWP1 yersiniabactin synthase (Gehring, A.M., DeMoll, E. ( 
Fetherston, J.D.* Mori, I., Mayhew, G.F., Blattner, F.R., Walsh, C.T., and Perry, R.D.: Iron 
acquisition in plague: modular logic in enzymatic biogenesis of yersiniabactin by Yersinia 
pestis; Chem. BioLS, 573-586, 1998). This MT domain in EPOS D is proposed to be 
responsible for the incorporation of the second methyl side group (C21 or C22) at C4. 

epoE (nucleotides 54935-62254 of SEQ ID NO:1 ) codes for EPOS E (SEQ ID NO:7), 
a type I polyketide synthase consisting of one module, harboring a KS (nucleotides 55028- 
56284 of SEQ ID NO:1 , amino acids 32-450 of SEQ ID NO:7); a maloriyl CoA-specific AT 
(nucleotides 56600-57565 of SEQ ID NO:1, amino acids 556-877 of SEQ ID NO:7); a DH 
(nucleotides 57593-58087 of SEQ ID NO:1, amino acids 887-1051 of SEQ ID NO:7); a pro- 
bably nonfunctional ER (nucleotides 59366-60304 of SEQ ID NO:1, amino acids 1478-1790 
of SEQ ID NO:7); a KR (nucleotides 60362-61099 of SEQ ID NO:1 , amino acids 1810-2055 
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of SEQ ID NO:7); an ACP (nucleotides 61211-61426 of SEQ ID NO:1. amino acids 2093- 
21 64 of SEQ ID NO:7); and a thioesterase (TE) (nucleotides 61 427-62254 of SEQ ID N0:1 
amino acids 2165-2439 of SEQ ID NO:7). The ER domain in this module harbors an active 
site motif with some highly unusual amino acid substitutions that probably render this do- 
main inactive; The module incorporates an acetate extender unit (C2-C1), and reduces the 
P-keto at C3 to an enoyl group. Epothilones contain a hydroxyl group at C3, so this reduc- 
tion also appears to be excessive as discussed for the second module of EPOS D. The TE 
domain of EPOS E takes part in the release and cyclization of the grown polyketide chain 
via lactonizatiOn between the carboxyl group of C1 and the hydroxyl group of C15. 

Rve ORFs are detected upstream of epoA in the sequenced region. The partially se 
quenced orft has no homologues in the sequence databanks. The deduced protein pro- 
duct (Orf 2, SEQ ID NO:10) of ortz (nucleotides 3171-1900 on the reverse complement 
strand of SEQ ID NO:1) shows strong similarities to hypothetical ORFs from Mycobacterium 
and Streptomyces coelicohf, and more distant similarities to carboxypeptidases and DD- 
peptidases of different bacteria. The deduced protein product of orf3 (nucleotides 341 5- 
5556 of SEQ ID NO:1), Orf 3 (SEQ ID NO:11), shows homologies to Na/H antiporters of 
different bacteria. Orf 3 might take part in the export of epothilones from the producer 
strain. ort4 and orffi have no homologues in the sequence databanks. 

Eleven ORFs are found downstream of epoE in the sequenced region. epoF (nucle- 
otides 62369-63628 of SEQ ID NO:1) codes for EPOS F (SEQ ID NO:8), a deduced protein 
with strong sequence similarities to cytochrome P450 oxygenases. EPOS F may take part 
in the adjustment of the redox state of the carbons CI 2, C5, and/or C3. The deduced pro- 
tein product of ort\ 4 (nucleotides 67334-68251 Of SEQ ID NO:1 ), Orf 1 4 (SEQ ID NO:22) 
shows strong similarities to Gl:3293544, a hypothetic protein with no proposed function from 
Streptomyces coelicolor, and also to Gl:2654559, the human embrionic lung protein. It is 
also more distantly related to cation efflux system proteins like (31:2623026 from Methane- 
bacterium thermoautotrophicum, so it might also take part in the export of epothilones from 
the producing cells. The remaining ORFs (oriB-o/fl 3 and brf!5) show no homologies to 
entries in the sequence databanks. 



Example 13: Recombinant Expression of EpothHone Biosynthesis Genes 
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Epothilone synthase genes according to the present invention are expressed in hete- 
rologous organisms for the purposes of epothilone production at greater quantities than can 
be accomplished by fermentation of Sorangium cellulosum. A preferable host for hetero- 
logous expression is Streptomyces, e.g. Streptomyces coelicolor, which natively produces 
the polyketide actinorhodin. Techniques for recombinant PKS gene expression in this host 
are described in McDaniel et aL, Sc/ence 262: 1546-1550 (1993) and Kao et a/. f Science 
265: 509-512(1994). See also, Holmes etai., EMBO Journal 12(8): 3183-3191 (1993) and 
Bibb et aL, Gene 38: 21 5-226 (1 985), as well as U.S. Patent Nos. 5,521 ,077, 5,672,491 , 
and 5,712,146, which are incorporated herein by reference. 

According to one method, the heterologous host strain is engineered to contain a 
chromosomal deletion of the actinorhodin (act) gene cluster. Expression plasmids contai- 
ning the epothilone synthase genes of the invention are constructed by transferring DNA 
from a temperature-sensitive donor plasmid to a recipient shuttle vector in E. co// (McDaniel 
etai (1993) and Kao et aL (1994)), such that the synthase genes are built-up by homolo- 
gous recombination within the vector. Alternatively, the epothilone synthase gene cluster is 
introduced into the; vector by restriction fragment ligation. Following selection, e.g. as 
described in Kao et aL (1 994), DNA from the vector is introduced into the acf-minus 
Streptomyces coelicolor strain according to protocols set forth in Hopwood et at., Genetic 
Manipulation of Streptomyces. A Laboratory Manual (John Innes Foundation, Norwich, 
United Kingdom, 1985), incorporated herein by reference. The recombinant Streptomyces 
strain is grown on R2YE medium (Hopwood etai. (1985)) and produces epothilones. 
Alternatively, the epothilone synthase genes according to the present invention are ex- 
pressed jn other host organisms such as pseudomonads. Bacillus, yeast/insect cells and/or 
£ colL PKS and NRPS genes are preferably expressed in E. co// using the pT7-7 vector, 
which uses the T7 promoter. See, Tabor et a/., Proc. Natl. Acad. Sci.USA 82: 1 074-1 078 
(1985). In another embodiment, the expression vectors pKK223-3 and pKK223-2 are used 
to express PKS and NRPS genes in E. coli, either in transcriptional or translation^ fusion, 
behind the iac ortrc promoter. Expression of PKS and NRPS genes in heterologous hosts, 
which do not naturally have the phosphopantetheinyl (P-pant) transferases needed for post- 
radiational modification of PKS enzymes, requires the coexpression in the host of a P- 
pant transferase, as described by Kealey et aL, Proc, Natl. Acad. ScL USA 95: 505-509 
(1998). 
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Example 14: Isolation of Epothilones from Producing Strains 

Examples of cultivation, fermentation, and extraction procedures for polyketide isola- 
tion, which are useful for extracting epothilones from both native and recombinant hosts ac- 
cording to the present invention, are given in WO 93/10121 , incorporated herein by referen- 
ce, in Example 57 of U.S. Patent No. 5.639,949. in Gerth et ah, J. Antibiotics 49: 560-563 
(1996), and in Swiss patent application no. 396/98. filed February 19, 1998. and U.S. patent 
application no. 09/248,910 (that discloses also preferred mutant strains of Sorangium 
cellulosum), both of which are incorporated herein by reference. The following are pro- 
cedures that are useful for isolating epothilones from cultured Sorangium cellulosum strains 
such as So ce90. and may also be used for the isolation of epothilone from recombinant 
hosts. 

A: Cultivation of epothl lone-producinq strains: 

Strain: Sorangium cellulosum Soce-90 or a recombinant host strain 

according to the present invention. 

Preservation of the strain- In liquid N 2 . 

Media: Precultures and intermediate cultures: G52 

Main culture: 1B12 

G52 Medium- 

yeast extract, low in salt (BioSpringer, Maison Alfort, France) 2 g/l 
MgS0 4 (7 H 2 0) 1 ^ 

CaCI 2 (2H 2 0) 1g/| 

soya meal defatted Soyamine 50T (Lucas Meyer, Hamburg, 
Germany) 2g/l 

potato starch Noredux A-1 50 (Blattmann, Waedenswil, 

Switzerland) 8 g/l 

glucose anhydrous 2 g/l 

EDTA-Fe(III)-Na salt (8 g/l) 1 mM 
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pH 7.4, corrected with KOH 
Sterilisation: 20 mins. 120 °C 

1B12 Medium: 

potato starch Noredux A-150 (Blattmann, Waedenswil, 

Switzerland) 20g/l 
soya meal defatted Soyamine 50T (Lucas Meyer, Hamburg, 

Germany) 1 1 g/l 

EDTA-Fe(lll)-Na salt 8 mg/l 
pH 7.8, corrected with KOH 
' Sterilisation: 20 mins. 120 °C 

Addition of cvclodextrins and cvclodextrin derivatives: 

Cyciodextrins (Fluka, Buchs, Switzerland, or Wacker Chemie, 
Munich, Germany) in different concentrations are sterilised 
separately and added to the 1B12 medium prior to seeding. 

Cultivation : 1 ml of the suspension of Sorangium ceilulosum Soce-90 from a liquid N 2 am- 
poule is transferred to 10 ml of G52 medium (in a 50 ml Erienmeyer flask) and incubated for 
3 days at 180 rpm in an agitator at 30°C, 25 mm displacement. 5 ml of this culture is added 
to 45 ml of G52 medium (in a 200 ml Erienmeyer flask) and incubated for 3 days at 180 rpm 
in an agitator at 30°C, 25 mm displacement. 50 ml of this culture is then added to 450 ml pf 
G52 medium (in a 2 litre Erienmeyer flask) and incubated for 3 days at 180 rpm in an agi- 
tator at 30°G, 50 mm displacement. 

Maintenance culture: The culture is overseeded every 3-4 days, by adding 50 ml of culture 
to 450 ml of G52 medium (in a 2 litre Erienmeyer flask). All experiments and fermentations 
are carried out by starting with this maintenance culture. 



Tests in a flask: 

(I) Preculture in an aoitatino flask: 
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Starting with the 500 ml of maintenance culture, 1 x 450 ml of G52 medium are seeded with 
50 ml of the maintenance culture and incubated for 4 days at 180 rpm in an agitator at 
30°C f 50 mm displacement. 
{») Main culture in the aoitatino flask: 

40 ml of 1B12 medium plus 5 g/l 4-morpholine-propane-sulfonic acid (= MOPS) powder (in a 
200 ml Erlenmeyer flask) are mixed with 5 ml of a 10x concentrated cyclodextrin solution, 
seeded with 10 ml of preculture and incubated for 5 days at 180 rpm in an agitator at 30°C, 
50 mm displacement. 

Fermentation: Fermentations are carried out on a scale of 10 litres, 100 litres and 500 litres. 
20 litre and 100 litre fermentations serve as an intermediate culture step. Whereas the pre- 
cultures and intermediate cultures are seeded as the maintenance culture 10% (v/v), the 
main cultures are seeded with 20% (v/v) of the intermediate culture. Important: In contrast to 
the agitating cultures, the ingredients of the media for the fermentation are calculated on 
the final culture volume including the inoculum. If, for example, 18 litres of medium + 2 litres 
of inoculum are combined, then substances for 20 litres are weighed in, but are only mixed 
with 18 litres. 

Preculture in an agitating flask: 

Starting with the 500 ml maintenance culture, 4 x 450 ml of G52 medium (in a 2 litre Erlen- 
meyer flask) are each seeded with 50 ml thereof, and incubated for 4 days at 180 rpm in an 
agitator at 30°C, 50 mm displacement. 

Intermediate culture. 20 litres or 100 litres- 

20 litres ' 1 8 litres of G52 medium in a fermenter having a total volume of 30 litres are 
seeded with 2 litres of the preculture. Cultivation lasts for 3-4 days, and the conditions are: 
30°C, 250 rpm, 0.5 litres of air per litre liquid per min, 0.5 bars excess pressure, no pH 
control. 

100 litres: 90 litres of G52 medium in a fermenter having a total volume of 1 50 litres are 
seeded with 10 litres of the 20 litre intermediate culture. Cultivation lasts for 3-4 days, and 
the conditions are: 30°C, 150 rpm, 0.5 litres of air per litre liquid per min, 0.5 bars excess 
pressure, no pH control. 
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Main culture: 10 litres. 100 litres or 500 litres: . 

10 litres: The media substances for 10 litres of 1B12 medium are sterilised in 7 litres of 
water, then 1 litre of a sterile 10% 2-(hydroxypropyl) -p-cyclodextrin solution are added, and 
seeded with 2 litres of a 20 litre intermediate culture. The duration of the main culture is 6- 
7 days, and the conditions are: 30°C, 250 rpm, 0.5 litres of air per litre of liquid per min, 
0.5 bars excess pressure, pH control with H2SO4/KOH to pH 7.6 +/- 0.5 (i.e. no control 
between pH 7.1 and 8.1). 

100 litres: The media substances for 100 litres of. 1B12 medium are sterilised in 70 litres of 
water, then 10 litres of a sterile 10% 2-(hydroxypropyI) -p-cyclodextrin solution are added, 
and seeded with 20 litres of a 20 litre intermediate culture. The duration of the main culture 
is 6-7 days, and the conditions are: 30°C, 200 rpm, 0.5 litres air per litre liquid per min., 
0.5 bars excess pressure, pH control with H2SO4/KOH to pH 7.6 +/- 0.5. The chain of 
seeding for a 1 00 litre fermentation is shown schematically as follows: 

maintenance culture (5dbml) 
G52 medium 



10% 



10% 



precultures 
(4 x500 ml) 
G52 medium 



10% 



intermediate 
culture (e.g. 20 I) 
G52 medium 



maintenance culture 
(500 ml) G52 medium 



20% 



main culture 
(e.g. 1001) 
mediurh + HP-p-CD 



500 litres: The media substances for 500 litres of 1 B1 2 medium are sterilised in 350 litres of 
water, then 50 litres of a sterile 10% 2-(hydroxypropyl) -p-cyclodextrin solution are added, 
and seeded with 100 litres of a 100 litre intermediate culture. The duration of the main 
culture is 6-7 days, and the conditions are: 30°C, 120 rpm, 0.5 litres air per litre liquid per 
min., 0.5 bars excess pressure, pH control with H2SO4/KOH to pH 7.6+/- 0.5. 



Product analysis: 
Preparation of the sample: 



WO 99/66028 



PCT/EP99/04171 



-52- 



50 ml samples are mixed with 2 ml of polystyrene resin Amberlite XAD16 (Rohm + Haas, 
Frankfurt, Germany) and shaken at 180 rpm for one hour at 30°C. The resin is 
subsequently filtered using a 150 pm nylon sieve, washed with a little water and then added 
together with the filter to a 15 ml Nunc tube. 
Elution of the product from the resin: 

10 ml of isopropanol (>99%) are added to the tube with the filter and the resin. Afterwards, 
the sealed tube is shaken for 30 minutes at room temperature on a Rota-Mixer (Labinco BV, 
Netherlands). Then, 2 ml of the liquid are centrifuged off and the supernatant is added 
using a pipette to HPLC tubes. 
HPLC analysis: 



Column: 



Solvents: 



Gradient: 



Oven temp.: 
Detection: 
Injection vol.: 
Retention time: 



Waters-Symetry C18, 100 x 4 mm, 3.5 pm 
WAT066220 + preliminary column 3.9 x 20 mm 
WAT054225 

A: 0.02 % phosphoric acid 
B: Acetonitrile (HPLC-Quality) 
41% B from 0 to 7 min. 
100% B from 7.2 to 7.8 min. 
41 % B from 8 to 1 2 min. 
30°C 

250 nm, UV-DAD detection 
10 pi 

Epo A: 4.30 min Epo B: 5.38 min 



B: Effect of the addition of cvclodextrin and cvclodextrin derivatives to the epothilone 
concentrations attained. 

Cyclodextrins are cyclic (ct-1 ,4)-linked oligosaccharides of a-D-glucopyranose with a 
relatively hydrophobic central cavity and a hydrophilic external surface area. 

The following are distinguished in particular (the figures in parenthesis give the 
number of glucose units per molecule): a-cyclodextrin (6), fj-cyclodextrin (7), y- cyclodextrin 
(8), 6-cyclodextrin (9), e- cyclodextrin (10), ^-cyclodextrin (11), T)-cyclodextrin (12), and 8- 
cyclodextrin (13). Especially preferred are 5-cyclodextrin and in particular a-cyclodextrin, |3- 
cyclodextrin or y-cyclodextrin, or mixtures thereof. 
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, Cyclodextrin derivatives are primarily derivatives of the above-mentioned cyclodex- 
trins, especially of a-cyclodextrin, p-cyclodextrin or y-cyclodextrin, primarily those in which 
one or more up to all of the hydroxy groups (3 per glucose radical) are etherified or este- 
rified. Ethers are primarily alkyl ethers, especially lower alkyl, such as: methyl or ethyl ether, 
also propyl or butyl ether, the aryl-hydroxyalkyl ethers, such as phenyl-hydroxy-lower-alkyl, 
especially phenyl-hydroxyethyl ether; the hydroxyalkyl ethers, in particular hydroxy-lower- 
alkyl ethers, especially 2-hydroxyethyl, hydroxypropy| such as 2-hydroxypropyl or hydroxy- 
butyl such as 2-hydroxybutyl ether; the carboxyalkyl ethers, in particular carboxy-lower-alkyl 
ethers, especially carboxymethyl or carboxyethyl ether; derivatised carbbxyaikyl ethers, in 
particular derivatised carboxy-lower-alkyl ether in which the derivatised carboxy is etherified 
bramidated carboxy (primarily aminocarbonyl, mono- or di-lower-alkyl-aminocarbonyl, mor- 
pholino-, piperidino-, pyrroiidino- or piperazino-carbonyl, or alkyloxycarbonyl), in particular 
lower alkoxycarbonyl-lower-alkyl ether, for example methyloxycarbonylpropyl ether or 
ethyloxycarbonylpropyl ether, the sulfoalkyl ethers, in particular sulfo-lower-alkyl ethers, 
especially sulfobutyl ether; cyclodextrins in which one or more OH groups are etherified with 
a radical of formula 

-0-[alk-0-] n -H 

wherein alk is alkyl, especially lower alkyl, and n is a whole number from 2 to 12, especially 
2 to 5, in particular 2 or 3; cyclodextrins in which one or more OH groups are etherified with 
a radical of formula 

R' 

(Alk-O)- Alk— ^ 

wherein R' is hydrogen, hydroxy, -O-(alk-0) r H f -0-(alk(-R)-0-) p -H or 
-0-(alk(-R)-0-) q -alk-CO-Y; alk in all cases is alkyl, especially lower alkyl; m, b, p. q and z are 
a whole number from 1 to 12, preferably 1 to 5, in particular 1 to 3; and Y is OR r or NR 2 R 3 . 
wherein R tl R 2 and R 3 independently of one another, are hydrogen or lower alkyl, or R 2 and 
R 3 combined together with the linking nitrogen signify morphoiino, piperidino, pyrroiidino or 
piperazino; 

or branched cyclodextrins, in which etherifications or acetals with other sugar molecules are 
present, especially glucosyh diglucosyl- (Ga-p-cyclbdextrin), maitosyl- or dimaltosyl- 
cyclodextrin, or N-acetylglucosaminyl-, glucosaminyl-, N-acetylgalactosaminyl- or 
galactosaminyl-cyclodextrin. 
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Esters are primarily aikanoyl esters, in particular lower alkanoyl esters, such as acetyl 
esters of cyclpdextrins. 

It is also possible to have cyclodextrins in which two or more different said ether and 
ester groups are present at the same time. 

Mixtures of two or more of the said cyclodextrins and/or cyclodextrin derivatives may 
also exist. 

Preference is given in particular to a-, 0- or cyclodextrins or the lower alkyl ethers 
thereof, such as methyl-p-cyclodextrin or in particular 2.6-di^b-methyNp,cyclodextrin, or in 
particularthe hydroxy lower alkyl ethers thereof, such as 2-hydroxypropyl-a-. 2-hydroxy- 
prppyl-p- or 2-hydroxypropyl-^cyclodextrin. 

The cyclodextrins or cyclodextrin derivatives are added to the culture medium 
preferably in a concentration of 0.02 to 10. preferably 0.05 to 5. especially 0.1 to 4. for 
example 0.1 to 2 percent by weight (w/v). 

Cyclodextrins or cyclodextrin derivatives are known or may be produced by known 
processes (see for example US 3.459.731; US 4.383.992; US 4,535.152; US 4.659.696; EP 
0 094 157; EP 0 149 197; EP 0 197 571; EP 0 300 526; EP 0 320 032; EP 0 499 322; EP 0 
503 710; EP 0 818 469; WO 90/12035; WO 91/11200; WO 93/19061; WO 95/08993; WO 
96/14090; GB 2.189.245; DE 3.118,218; DE 3,317,064 and the references mentioned the- 
rein, which also refer to the synthesis of cyclodextrins or cyclodextrin derivatives, or also: T. 
Loftsson and ME. Brewster (1996): Pharmaceutical Applications of Cyclodextrins: Drug 
Solubilization and Stabilisation: Journal of Pharmaceutical Science 85 (10):1017-1025; R.A. 
Rajewski and V.J. Stella(1 996): Pharmaceutical Applications of Cyclodextrins: In Vivo Drug 
Delivery: Journal of Pharmaceutical Science 85 (1 1 ): 1 1 42-1 1 69). 

All the cyclodextrin derivatives tested here are obtainable from the company Fluka, 
Buchs, CH. The tests are earned out in 200 ml agitating flasks with 50 ml culture volume. As 
controls, flasks with adsorber resin Amberlite XAD-16 (Rohm & Haas, Frankfurt, Germany) 
and without any adsorber addition are used. After incubation for 5 days, the following 
epothilone titres can be determined by HPLC: 



Table 2: 



Addition 


order 
No. 


Cone 
t%w/v]V 


Epo A [mg/f] 


EpoB [mg/l] 


Amberlite XAD-16 (v/v) 




2.0(%v/v) 


9.2 


3.8 
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Addition 


order 
No. 


Cone . 


Epo A [mg/l] 


Epo B [mg/l] 


2-hydroxypropyl-p-cyclodextrin 


56332 


0.1 


2.7 


1.7 


2-hydroxyprqpyl-p<yclodextrin 


M 


0.5 


4.7 


3.3 


2-hydroxypropyl-p-cyclodextrin 


tt 


1.0 


4.7 


3.4 


2-hydroxypropyl-p-cyclodextrin 


. u 


2.0 


4.7 


4.1 


2-hydroxypropyl-p-cyclodextrin 


U ' 


5.0 


1.7 


0.5 


2-hydroxypropyl- a-cyclodextrin 


56330 


0.5 


1.2 


1.2 


2-hydroxypropyl- a-cyclodextrin 




1.0 


1.2 


1.2 


2-hydroxypropyl- a-cyclbdextrih 


• « 


5,0 


2.5 


2.3 


P-cyclodextrin 


28707 


0.1 


1.6 


1.3 


P-cyclodextrin 


H 


0.5 


3.6 


2.5 


p-cyclodextrin 


U 


1.0 


4.8 


3.7 


P-cyclodextrin 


M 


2.0 


4.8 


2.9 


P-cyclodextrin 


U 


5.0 


1.1 


0.4 


methyl-p-cyclodextrin 


66292 


0.5 


0.8 


<0.3 


methyl-p-cyclodextrin 


« ■ 


1.0 


<0.3 


<0.3 


methyl-p-cyclodextrin 


■M 


2.0 


<0.3 


<0.3 


2,6 di-o-methyl-p-cyclodextrin 


39915 


1.0 


<0.3 


<0.3 


2rhydroxypropyl-->Kiyclodextrin 


56334 


0.1 


0.3 


<0.3 


2-hydroxypropyl-Y^cyclodextrin 


u 


0.5 


0.9 


0.8 


2-hydroxypropyl-y-cyclodextrin 


u 


1.0 


1.1 


0.7 


2-hydroxypropyl-y-cyclodextrin 


u 


2.0 


2.6 


0.7 


2-hydroxypropyl-Y^cyclodextrin 


. M 


5.0 


5.0 


1.1 


no addition 






0.5 


0.5 



!) Apart from Amberlite (%v/v), ail percentages are by weight (%w/v). 

Few of the cyclodextrins tested (2,6-di-o-methyl-p-cyclodextrin, methyl-p-cyclodextrin) 
display no effect or a negative effect on epothilone production at the concentrations used. 
1^2% 2-hydroxy-propyl-p-cyclodextrin and p-cyclodextrin increase epothilone production in 
the examples by 6 to 8 times compared with production using no cyclodextrins. 
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C: 10 litre fermentation wi th 1% 2-fhvdroxvprobvn-B.cvclodextrinl: 

Fermentation is carried out in a 15 litre glass fermenter. The medium contains 10 g/l of 
2-(hydroxypropyl)-p-cyclodextrin from Wacker Chemie, Munich, Germany. The progress of 
fermentation is illustrated in Table 3. Fermentation is ended after 6 days and working up 
takes place. 

Table 3 : Progress of a 10 litre fermentation 



duration of culture [d] 


Epothilone A [mg/l] 


Epothilone B [mg/l] 


0 


• 0 • 


0 


■ 1 . 


0 


0 


2 


0.5 


0.3 


3 


1.8 


2.5 


4 


3.0 


5.1 


5 


3.7 


5.9 




3.6 


5.7 



D: 100 litre fermentation with 1% 2-/hvdroxvorobvlV.B-cvclodextrin)r 

Fermentation is carried out in a 150 litre fermenter. The medium contains 10 g/l of 2- 
(Hydroxypropyl)-p-cyclodextrin. The progress of fermentation is illustrated in Table 4. The 
fermentation is harvested after 7 days and worked up. 

Table 4: Progress of a 100 litre fermentation 



duration of 
culture [d] 


Epothilone A 
[mg/IJ 


Epothilone B 
[mg/l] 


0 


0 


0 


• 1 


0 


0 


2 


0.3 


0 
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3 


0.9 


1.1 


4 


•1.5 


2.3 


5 


1.6 


3.3 


6 


1.8 


3.7 


7 . 


1.8 


3.5 



E: 500 litre fermentation with 1 % 2-(hvdroxvpropvn-B-c vclodextrin): 

Fermentation is carried out in a 750 litre fermenter. The medium contains 10 g/l of 2 
(Hydroxypropyl)-p-cyclodextrin. The progress of fermentation is illustrated in Table 5. The 
fermentation is harvested after 7 days and worked up. 

Table 5: Progress of a 500 litre fermentation 



duration of culture [d] 


Epothilone A 
[mg/l] 


Epothilone B [mg/l] 


I - 0 


0 


0 


1 


0 


0 


2 


6 


0 ... 


. 3 


0.6 


0.6 


4 


1.7 


2.2 ^ , 


5 


3.1 


4.5 


.. 6 


3.1 


5.1 1 



F: Comparison example 10 litre fermentation w ithout adding an adsorber: 

Fermentation is carried out in a 15 litre glass fermenter. The medium does not contain 
any cyclodextrin or other adsorber. The progress of fermentation is illustrated in Table 6. 
The fermentation is not harvested and worked up. 

Table 6: Progress of a 10 litre fermentation without adsorber. 
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duration of culture [d] 


Epothilone A 
img/ij 


Epothilone B 
[mg/lj 


0 


0 


n 

V/ 


1 


o 


0 


2 : - ; 


0 


0 


3 


0 


0 


4 


0.7 


0.7 


5 


0.7 


1.0 


6 


0.8 


1.3 



G: Working up of the e pothfiones: Isolation from a 500 litre main culture: 

The volume of harvest from the 500 litre main culture of example 2D is 450 litres and 
is separated using a Westfalia clarifying separator Type SA-20-06 (rpm = 6500) into the 
liquid phase (centrifugate + rinsing water = 650 litres) and solid phase (cells = ca. 15 kg). 
The main part of the epothilones are found in the centrifugate, The centrifuged cell pulp 
contains < 15% of the determined epothilone portion and is not further processed. The 650 
litre centrifugate is then placed in a 4000 litre stirring vessel, mixed with 10 litres of 
Amberiite XAD-16 (centrifugaterresin volume = 65:1) and stirred. After a period of contact of 
ca. 2 hours, the resin is centrifuged away in a Heine overflow centrifuge (basket content 
40 litres; rpm = 2800). The resin is discharged from the centrifuge and washed with 
10-15 litres of deionised water. Desorption is effected by stirring the resin twice, each time 
in portions with 30 litres of isopropanol in 30 litre glass stirring vessels for 30 minutes. 
Separation of the isopropanol phase from the resin takes place using a suction filter. The 
isopropanol is then removed from the combined isopropanol phases by adding 15-20 litres 
of water in a vacuum-operated circulating evaporator (Schmid-Verdampf er) and the 
resulting water phase of ca. 1 0 litres is extracted 3x each time with 1 0 litres of ethyl acetate. 
Extraction is effected in 30 litre glass stirring vessels. The ethyl acetate extract is 
concentrated to 3-5 litres in a vacuum-operated circulating evaporator (Schmid-Verdampfer) 
and afterwards concentrated to dryness in a rotary evaporator (Buchi type) under vacuum. 
The result is an ethyl acetate extract of 50.2 g. The ethyl acetate extract is dissolved in 
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500 ml of methanol, the insoluble portions filtered off using a folded filter, and the solution 
added to a 10 kg Sephadex LH 20 column (Pharmacia, Uppsala, Sweden) (column 
diameter 20 cm, filling level ca. 1 .2 m). Elution is effected with methanol as eluant. 
Epothilone A and B is present predominantly in fractions 21-23 (at a fraction size of 1 litre). 
These fractions are concentrated to dryness in a vacuum on a rotary evaporator (total 
weight 9.0 g). These Sephadex peak fractions (9.0 g) are thereafter dissolved in 92 ml of 
acetonitrile:-water>methylene chloride = 50:40:2, the solution filtered through a folded filter 
and added to a RP column, (equipment Prepbar 200, Merck; 2. 0 kg LiChrospher RP-1 8 
Merck, grain size 12^m, column diameter 10 cm, filling level 42 cm; Merck, Darmstadt, 
Germany). Elution is effected with acetonitrile:water = 3:7 (flow rate = 500 ml/min.; retention 
time of epothilone A = ca. 51-59 mins.; retention time of epothilone B = ca. 60-69 mins.). 
Fractionation is monitored with a UV detector at 250 nm. The fractions are concentrated to 
dryness under vacuum on a Buchi-Rotavapor rotary evaporator. The weight of the 
epothilone A peak fraction is 700 mg, and according to HPLC (externa! standard) it has a 
content of 75.1%. That of the epothilone B peak fraction is 1980 mg, and the content 
according to HPLC (external standard) is 86.6%. Finally, the epothilone A fraction (700 mg) 
is crystallised from 5 ml of ethyl acetate:toluene = 2:3, and yields 170 mg of epothilone A 
pure crystallisate [content according to HLPC (% of area) = 94.3%]. Crystallisation of the 
epothilone B fraction (1 980 mg) is effected from 18 ml of methanol and yields 1 440 mg of 
epothilone B pure crystallisate [content according to HPLC (% of area) = 99.2%]. m.p. 
(Epothilone B): e.g. 124-125 °C; 'H-NMR data for Epothilone B: 

500 MHz-NMR, solvent: DMSO-d6. Chemical displacement 6 in ppm relative to TMS. s = 
singlet; d = doublet; m = multiplet 

6 (Multiplicity) Integral (number of H) 

7.34 (s) 1 

6.50 (s) 1 

5.28 (d) 1 

5.08 (d) 1 

4.46 (d) 1 

4.08 (m) 1 
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3.47 (m) 


' 1 


3,11 (m) 


1 


2.83 (dd) 


. 1 


2.64 (s) 


3 


2.36 (m) 


2 


2.09 (s) 


3 


2,04 (m) 


1 


1.83 (m) 


1 


1.61 (m) 


1 



1.47-1.24 (m) 4 
1.18 (s) 6 
1.13 (m) 2 



1.06 (d) 3 
0.89 (d + s, overlapping) 6 

•' 2 = 41 

Example 15: Medical Uses of Recombinant^ Produced Epothilones 

Pharmaceutical preparations or compositions comprising epothilones are used for 
example in the treatment of cancerous diseases, such as various human solid tumors. 
Such anticancer formulations comprise, for example, an active amount of an epothilone 
together with one or more organic or inorganic, liquid or solid, pharmaceutical^ suitable 
carrier materials. Such formulations are delivered, for example, enterally, nasally, rectally, 
orally, or parenterally, particularly intramuscularly or intravenously. The dosage of the 
active ingredient is dependent upon the weight, age, and physical and pharmacokinetical 
condition of the patient and is further dependent upon the method of delivery. Because 
epothilones mimic the biological effects of taxol, epothilones may be substituted for taxol in 
compositions and methods utilizing taxol in the treatment of cancer. See, for example, U.S. 



WO 99/66028 



PCI7EP99/04171 



-61 - 

Patent Nos. 5,496,804, 5,565,478, and 5,641 ,803,'all of which are incorporated herein by 
reference. 

For example, for treatments, epothilone B is supplied in individual 2 ml glass vials 
formulated as 1 rrig/1 ml of clear, colorless intravenous concentrate. The substance is 
formulated in polyethylene glycol 300 (PEG 300) and diluted with 50 or 100 ml 0.9% 
Sodium Chloride Injection, USP, to achieve the desired final concentration of the drug for 
infusion. It is administered as a single 30-minute intravenous infusion every 21 days 
(treatment three-weekly) for six cycles, or as a single 30-minute intravenous infusion every 
7 days (weekly treatment). 

Preferably, for weekly treatment, the dose is between about 0.1 and about 6, 
preferably about 0.1 and about 5 mg/m 2 , more preferably about 0.1 and about 3 mg/m 2 , 
even more preferably 0.1 and 1.7 mg/m 2 , most preferably about 0.3 and about i mg/m 2 ; for 
three-weekjy treatment (treatment every three weeks or every third week) the dose is 
between about 0.3 and about 18 mg/m 2 , preferably about 0.3 and about 15 mg/m 2 , more 
preferably about 0.3 and about 12 mg/m 2 , even more preferably about 0.3 and about 7.5 
mg/m 2 , still more preferably about 0.3 and about 5 mg/m 2 , most preferably about 1.0 and 
about 3.0 mg/m 2 . This dose is preferably administered to the human by intravenous (i.v.) ' 
administration during 2 to 180 min, preferably 2 to 120 min, more preferably during about 5 
to about 30 min, most preferably during about 10 to about 30 min, e.g. during about 30. min. 

While the present invention has been described with reference to specific 
embodiments thereof, it will be appreciated that numerous Variations, modifications, and 
embodiments are possible, and accordingly, all such variations, modifications and 
embodiments are to be regarded as being within the spirit and scope of the present 
invention. 
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BUDAPEST TREATY ON THE INTERNATIONAL 
RECOGNITION OF THE DEPOSIT OF MICROORGANISMS 
FOR THE PURPOSE OF PATENT PROCEDURES 



INTERNATIONAL FORM 



TO 

Novartie AG 

Nov art i* Corporation 

Patent and Trademark Dept. 

3054 Cornwallls Rd. 

Research Triangle Park, NC 27709 

NAME AND ADDRESS 
OF DEPOSITOR 



RECEIPT IN THE CASE OF AN ORIGINAL DEPOSIT 
issued pursuant to Rule 7,1 by the 
INTERNATIONAL DEPOSITARY AUTHORITY 
identified at the bottom of this page 



I. IDENTIFICATION OF THE MICROPRO ANISM 



Identification reference given by the 
DEPOSITORS 
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INTERNATIONAL DEPOSITARY AUTHORITY: 

NRRL B-30033 
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n a scientific description 

EH a proposed taxonomic designation 

(Mark with a cross where applicable) . 



Ill- RECEIPT AND ACCEPTANCE 



ThiB International Depositary Authority accept* the microorganiem identified under I 
above, which was received by it on June 11, 1998(date of the or igina! deposit J ' 
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XRTBKHKrXQRAL FORM 



TO 

Kovatis A£f 

c/o Hovartis Agricultural Biotechnology 

Research, Inc. 
Patent £ Trademark Department 
3054 Cormrallia Road 
Research Triangle Park, NC 27709 
HAKE AND ADDRESS 
OP DEPOSITOR 



RECEIPT 2X7 THE CASE OF AN ORIGINAL DEPOSIT 
issued pursuant to Rule 7.1 by the 
INTERNATIONAL DEPOSITARY ADTHORITT 
identified at the bottom of this page 



IDENTIFICATION OP THE MICROORGANISM 



Identification reference given by the 
DEPOSITOR: 

Escherichia coli DK10B [pBP032] 



Accession number given by the 
IHTERHATXGHAX* DEPOSITARY AUTHORITY: 

NRRL B -30113 



II, 8 ClHWrl FIC DESCRIPTION AND /OR PROPOSED TAXONOMIC DESIGNATION 



The microorganism identified under I. above was accompanied by: 
n a scientific description 

ffi a proposed texonomic designation 

(Mark vith a cross where applicable) 



III. RECEIPT AND ACCEPTANCE 



This International Depositary Authority accepts the microorganism identified under J. 
above, which was received by it on April 16, 1959 (date of the original deposit) 1 



TV. RECEIPT OF REQUEST FOR CONVERSION 



The microorganism identified under I . above was received by this International 
Depositary Authority on (date of the original deposit) and a request 

to convert the original deposit to a deposit under the Budapest Treaty was received by 
it on (date of receipt of request for conversion) • 



INTERNATIONAL DEPOSITARY AUTHORITY 



Kamet Agricultural Research Culture 
Collection (NRRXi) 
International Depositary Authority 

Address t 1815 KV University Street 

Peoria. Illinois 61604 U.S.A. 



Signature (s) of person (a) having the power . 
to represent the Intensatienal Depositary 
Authority or of authorized official (s) i 



Date; 



1 Where Rule 6.4(d) applies, such date is the date on which the status of internati onal 
depositary authority was acquired. 
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, What is claimed is: 

1 . An isolated nucleic acid molecule comprising a nucleotide sequence that 
encodes at least one polypeptide involved in the biosynthesis of epothilone. 

2. An isdlated.nucleic acid molecule according to claim 1 , wherein said nucleotide 
sequence is isolated from a myxobacterium. 

3. An isolated nucleic acid molecule according to claim 2, wherein said 
myxobacterium is Sorangium cellulosum. 

4. A chimeric gene comprising a heterologous promoter sequence operatively linked 
to a nucleic acid molecule according to claim 1. 

5. A recombinant vector comprising a chimeric gene according to claim 4. 

6. A recombinant host cell comprising a chimeric gene according to claim 4. 

7. The recombinant host cell of claim 6, which is a bacteria. 

8. The recombinant host cell of claim 7, which is an Actinomycete. 

9. The recombinant host cell of claim 8, which is Streptomyces. 

i 0. A Bac clone comprising a nucleic acid molecule according to claim 1 . 

11. The Bac clone of claim 10. which is pEP015. 

12. An isolated nucleic acid molecule according to claim 1 , wherein said polypeptide 
comprises an amino acid sequence substantially similar to an amino acid sequence 
selected from the group consisting of: SEQ ID NO:2, amino acids 1 1 -437 of SEQ ID N02, 
amino acids 543-864 of SEQ ID NO:2, amino acids 974-1273 of SEQ ID NO:2. amino acids 
1314-1385 of SEQ ID NO:2, SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 
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1 18V125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NQ:3, amino acids 353-363 of 
SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, 
amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 
868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3; amino acids 918-940 of 
SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID 
NO:3, amino acids 973-1256 of SEQ ID NO:3, amino acids 1344-1351 of SEQ ID NO:3. 
SEQ ID NO:4, amino acids 7-432 of SEQ ID NO:4, amino acids 539-859 of SEQ ID NO:4 f 
amino acids 869-1037 of SEQ ID NO:4, amino acids 1439-1684 of SEQ ID NO:4, amino 
acids 1722-1792 of SEQ ID NO:4 f SEQ ID NOrS, amino acids 39-457 of SEQ ID NO:5, 
amino acids 563-884 of SEQ ID NO:5, amino acids 1147-1399 of SEQ ID NO:5, amino 
acids 1434-1506 of SEQ ID NO:5 f amino acids 1524-1950 of SEQ ID NO:5 t amino acids 
2056-2377 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, amino acids 2932- 
3005 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 3555-3876 of 
SEQ ID NO:5 f amino acids 3886-4048 of SEQ ID NO:5 # amino acids 4433-4719 of SEQ ID 
NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID NO:5, 
amino acids 51 03-5525 of SEQ ID NO:5, amino acids 5631 -5951 of SEQ ID NO:5, amino 
acids 5964-6132 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5 f amino acids 
6857-7101 of SEQ ID NO:5, amino acids 7140-721 1 of SEQ ID NO:5, SEQ ID NO:6, amino 
acids 35-454 of SEQ ID NO:6, amino acids 561-881 of SEQ ID NO:6, amino acids 11 43- 
1393 of SEQ ID NO:6, amino acids 1430-1503 of SEQ ID NO:6, amino acids 1522-1946 of 
SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO:6 t amino acids 2383-2551 of SEQ ID 
NO:6, amino acids 2671-3045 of SEQ ID NO:6 f amino acids 3392-3636 of SEQ ID NO:6, 
amino acids 3673-3745 of SEQ ID NO:6, SEQ ID NO:7, amino acids 32-450 of SEQ ID 
NO:7, amino acids 556-877 of SEQ ID NO:7 t amino acids 887-1051 of SEQ ID Nd:7, amino 
acids 1478-1790 of SEQ ID NQ:7, amino acids 1810-2055 of SEQ ID NO:7, amino acids 
2093-2164 of SEQ ID NO:7, amino acids 21 65-2439 of SEQ ID NO:7, SEQ ID NO:8, SEQ 
ID NO:10, SEQ ID NO:11, and SEQ ID NO:22. 

13. An isolated nucleic acid molecule according to claim 12, wherein said 
polypeptide comprises an amino acid sequence selected from the group consisting of: SEQ 
ID NO:2. amino acids 1 1-437 of SEQ ID NO:2 f amino acids 543-864 of SEQ ID NO:2 f 
amino acids 974-1273 of SEQ ID NO:2, amino acids 1314-1385 of SEQ ID NO:2, SEQ ID 
NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 1 18-125 of SEQ ID NO:3 ( amino 
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acids 199-212 of SEQ ID NO:3. amino acids 353-363 of SEQ ID NO:3, amino acids 549- 
565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3. amino acids 669-684 of SEQ 
ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO-3 
amino acids 903-912 of SEQ ID NO:3. amino acids 918-940 of SEQ ID NO:3, amino acids 
1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3. amino acids 973- 
1256 of SEQ ID NO:3, amino acids 1344-1351 of SEQ ID NO:3, SEQ ID NO:4. amino acids 
7-432 of SEQ ID NO:4, amino acids 539-859 of SEQ ID NO:4, amino acids 869-1037 of 
SEQ ID NO:4. amino acids 1439-1684 of SEQ ID NO:4, amino acids 1722-1792 of SEQ ID 
NO:4, SEQ ID NO:5, amino acids 39-457 of SEQ ID NO:5, amino acids 563-884 of SEQ ID 
NO:5. amino acids 1147-1399 of SEQ ID NO:5, amino acids 1434-1506 of SEQ ID N05 
amino acids 1524-1950 of SEQ ID NO:5. amino acids 2056-2377 of SEQ ID NO-5 amino 
acids 2645-2895 of SEQ ID NO:5. amino acids 2932-3005 of SEQ ID NO:5, amino acids 
3024-3449 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5. amino acids 3886- 
4048 of SEQ ID NO:5. amino acids 4433-4719 of SEQ ID NO:5, amino acids 4729-4974 of 
SEQ ID NO:5, amino acids 5010-5082 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID 
NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO-5 
amino acids 6542-6837 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5 amino ■ 
acids 7140-721 1 of SEQ ID NO:5, SEQ ID NO:6, amino acids 35-454 of SEQ ID N0 6 
amino acids 561-881 of SEQ ID NO:6, amino acids 1 143-1393 of SEQ ID N0 6 amino 
acids 1430-1503 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, amino acids 
2053-2373 of SEQ ID NO:6, amino acids 2383-2551 of SEQ ID NO:6. amino acids 2671- 
3045 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, amino acids 3673-3745 of 
SEQ ID NO:6, SEQ ID NO:7. amino acids 32-450 of SEQ ID NO:7, amino acids 556-877 of 
SEQ ID NO:7, amino acids 887-1051 of SEQ ID NO:7, amino acids 1478-1790 of SEQ ID 
NO:7, amino acids 1810-2055 of SEQ ID NO:7, amino acids 2093-2164 of SEQ ID NQ7 
amino acids 21 65-2439 of SEQ ID NO:7. SEQ ID NO:8, SEQ ID NO:1 0, SEQ ID NO-1 1 ' 
and SEQ ID NO:22. 

14. An isolated nucleic acid molecule according to claim 12. wherein said nucleotide 
sequence is substantially similar to a nucleotide sequence selected from the group 
consisting of: the complement of nucleotides 1900,3171 of SEQ ID N0:1, nucleotides 3415- 
5556 of SEQ ID N0:1. nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643-8920 of 
SEQ ID NO:1 , nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ 
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ID NO:!, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID 
NO:1, nucleotides 12085-121 14 of SEQ ID NO:1 f nucleotides 12223-12246 of SEQ ID 
, nucleotides 12466-12507 of SEQ ID NO:1 , nucleotides 12928-12960 of SEQ ID 
nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID 
nucleotides 13876-13923 of SEQ .ID NO: 1, nucleotides 14313-14334 of SEQ ID 
nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID 
nucleotides 14623-14692 of SEQ ID NO;! nucleotides 15673-15693 of SEQ ID 
nucleotides 15724-15762 of SEQ ID NO:i. nucleotides 14788-15639 oi SEQ ID 
nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID 
nucleotides 16269-17546 of SEQ ID NO: 1, nucleotides 17865-18827 of SEQ ID 
nucleotides 18855-19361 of SEQ ID NO: 1, nucleotides 20565-21302 of SEQ ID 
nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID 
nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID 
nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID 
nucleotides 26318-27595 of SEQ ID NO:! nucleotides 2791 1-28876 of SEQ ID 
nucleotides 29678-30429 of SEQ ID NO:! nucleotides 30539-30759 of SEQ ID 
nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID 
nucleotides 33401 -33889 of SEQ ID NO:1 , nucleotides 35042-35902 of SEQ ID 
, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID 
nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID 
nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID 
nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID 
nucleotides 43524-54920 of SEQ ID NO:1 f nucleotides 43626-44885 of SEQ ID 
nucleotides 45204-461 66 of SEQ ID NO:1 , nucleotides 46950-47702 of SEQ ID 
nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID 
nucleotides 49680-50642 of SEQ ID NO:1 , nucleotides 50670-51 176 of SEQ ID 
, nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID 
, nucleotides 54540-54758 of SEQ ID NOM , nucleotides 54935-62254 of SEQ ID 
nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID 
nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID 
nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID 
nucleotides 61 427-62254 of SEQ ID NO:1 , nucleotides 62369-63628 of SEQ ID 
nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:! 
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15. A nucleic acid molecule according to claim 12, wherein said nucleotide 
sequence is selected from the group consisting of: the complement of nucleotides 1900- 
3171 of SEQ ID NO:1, nucleotides 341 5-5556 of SEQ ID NO: 1 , nucleotides 7610-11875 of 
SEQ ID NOM, nucleotides 7643-8920 of SEQ ID NO:1 , nucleotides 9236-10201 of SEQ ID 
NO:1, nucleotides 10529-11428 of SEQ IDNO:1. nucleotides 1 1549-1 1 764 of. SEQ ID 
NO:1 , nucleotides 1 1872-16104 of SEQ ID NO:1 , nucleotides 12085-121 14 of SEQ ID 
NO:1, nucleotides 12223-12246 of SEQ ID NO:1. nucleotides 12466-12507 of SEQ ID 
NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID 
NO:1, nucleotides 13633-13680 of SEQ ID NO:1 , nucleotides 13876-13923 of SEQ ID 
NO:1. nucleotides 14313-14334 of SEQ ID NO: 1. nucleotides 14473-14547 of SEQ ID 
.'NO:!, nucleotides 14578-14607 of SEQ ID NO:1. nucleotides 14623-14692 of SEQ ID 
NO:1, nucleotides 15673-15693 of SEQ ID NO:1 , nucleotides 15724-15762 of SEQ ID 
NO:1. nucleotides 14788-15639 of SEQ ID NO:1, nucleotides 15901-15924 of SEQ ID 
NO:1, nucleotides 16251-21749 of SEQ ID NO: 1. nucleotides 16269-17546 of SEQ ID 
NO:1, nucleotides 17865-18827 of SEQ ID NO:1, nucleotides 18855-19361 of SEQ ID 
NO:1, nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID 
NO.-1, nucleotides 21746-43519 of SEQ IDNO:1, nucleotides 21860-23116 of SEQ ID 
NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ ID 
NO:1, nucleotides 26045-26263 of SEQ ID NO:!, nucleotides 26318-27595 of SEQ ID 
NO:1 , nucleotides 2791 1 -28876 of SEQ ID NO:1 , nucleotides 29678-30429 of SEQ ID 
NO:1, nucleotides 30539-30759 of SEQ ID NO:1. nucleotides 3081 5-32092 of SEQ ID 
NO:1, nucleotides 32408-33373 of SEQ ID NO:1. nucleotides 33401-33889 of SEQ ID 
NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID 
NO:1, nucleotides 36773-36991 of SEQ ID N0:1, nucleotides 37052-38320 of SEQ ID 
NO:1, nucleotides 38636-39598 of SEQ ID NO:1 , nucleotides 39635-40141 of SEQ ID 
NO:1, nucleotides 41369-42256 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID 
NO:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 43524-54920 of SEQ ID 
N0:1, nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID 
NO:1, nucleotides 46950-47702 of SEQ ID NO:1. nucleotides 47811-48032 of SEQ ID 
NO:1 , nucleotides 48087-49361 of SEQ ID NO:1 , nucleotides 49680-50642 of SEQ ID 
NO:1, nucleotides 50670-51176 of SEQ ID NO:1. nucleotides 51534-52657 of SEQ ID 
NO:1, nucleotides 53697-54431 of SEQ ID NO:1 , nucleotides 54540-54758 of SEQ ID 
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NO;1, nucleotides 54935-62254 of SEQ ID NO:1. nucleotides 55028-56284 of SEQ ID 
NO:1 , nucleotides 56600-57565 of SEQ ID NO:1 . nucleotides 57593-58087 of SEQ ID 
NO:1, nucleotides 59366-60304 of SEQ ID NO:1, nucleotides 60362-61099 of SEQ ID 
NO:1, nucleotides 61211-61426 of SEQ ID NO:1, nucleotides 61427-62254 of SEQ ID 
NO:1, nucleotides 62369-63628 of SEQ ID NO:1, nucleotides 67334-68251 pf SEQ ID 
NO:1 f and nucleotides 1-68750 SEQ ID NO:1. 

16. A chimeric gene comprising a heterologous promoter sequence operatively 
linked to a nucleic acid molecule according to claim 12. 

1 7. A recombinant vector comprising a chimeric gene according to claim 1 6. 

18. A recombinant host cell comprising a chimeric gene according to claim 16. 

19. The recombinant host cell of claim 1 8 t which is a bacteria. 

20. The recombinant host cell of claim 1 9, which is an Actinomycete. 

21 . The recombinant host cell of claim 20, which is Streptomyces. 

22. An isolated nucleic acid molecule according to claim 1 , wherein said nucleotide 
sequence comprises a consecutive 20 base pair nucleotide portion identical in sequence to 
a consecutive 20 base pair portion of a nucleotide sequence selected from the group 
consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415 
5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643-8920 of 
SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ 
ID NO:1, nucleotides 11549-11764 of SEQ ID NO:!, nucleotides 11872-16104 of SEQ ID 
NO:1, nucleotides 12085-12114 of SEQ ID NO:1 ( nucleotides 12223-12246 of SEQ ID 
NO:1 , nucleotides 1 2466-1 2507 of SEQ ID NO:1 , nucleotides 1 2928-1 2960 of SEQ ID 
NO:1 t nucleotides 13516-13566 of SEQ ID NO: 1, nucleotides 13633-13680 of SEQ ID 
NO:1, nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID 
NO: 1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID 
NO:1, nucleotides 14623-14692 of SEQ ID NO: 1, nucleotides 15673-15693 of SEQ ID 
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NO:1, nucleotides 1 5724-1 5762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID 
NO:1, nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID 
NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID 
NO:1, nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID 
NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID 
NO:1, nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID 
NO:1, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID 
NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 2791 1-28876 of SEQ ID 
NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID 
NO:1 t nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID 
NO: 1, nucleotides 33401-33889 of SEQ ID NO: 1 , nucleotides 35042-35902 of SEQ ID 
NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID 
NO:1. nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID 
NO;1, nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID 
NO:1 . nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 431 63-43378 of SEQ ID 
NO:1. nucleotides 43524-54920 of SEQ ID NO: 1, nucleotides 43626-44885 of SEQ ID 
NO;1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ IO 
NO:1, nucleotides 4781 1-48032 of SEQ ID NO: 1, nucleotides 48087-49361 of SEQ ID 
NO:1, nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID 
NO:1, nucleotides 51534-52657 of SEQ |D NO:1, nucleotides 53697-54431 of SEQ ID 
NO:1 , nucleotides 54540-54758 of SEQ ID NO:1 , nucleotides 54935 62254 of SEQ ID 
NO:1. nucleotides 55028-56284 of SEQ IDNOM, nucleotides 56600-57565 of SEQ ID 
NO:1, nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID 
NO:}, nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 6121 1-61426 of SEQ ID 
NO:1, nucleotides 61427-62254 of SEQ ID NO:1 , nucleotides 62369-63628 of SEQ ID 
NO:1, nucleotides 67334-68251 of SEQ ID NO:1. and nucleotides 1-68750 SEQ ID NO:1. 

23. A chimeric gene comprising a heterologous promoter sequence operatively 
linked to a nucleic acid molecule according to claim 22. 

24. A recombinant vector comprising a chimeric gene according to claim 23. 

25. A recombinant host cell comprising a chimeric gene according to claim 23. 
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26. The recombinant host cell of claim 25, which is a bacteria. 

27. The recombinant host cell of claim 26 f which is an Actinomycete. 

28. The recombinant host cell of claim 27, which is Streptomyces. 

29. An isolated nucleic acid molecule comprising a nucleotide sequence that 
encodes at least one epothilone synthase domain. 

30. An isolated nucleic acid molecule according to claim 29, wherein said epothilone 
synthase domain is a p-ketoacyl-synthase domain comprising an amino acid sequence 
substantially similar to ah amino acid sequence selected from the group consisting of: 
amino acids 1 1 -437 of SEQ ID NO:2, amino acids 7-432 of S EQ ID NO:4, amino acids 39- 
457 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 3024-3449 of 
SEQ ID NO:5, amino acids 51 03-5525 of SEQ ID NO:5, amino acids 35-454 of SEQ ID 
NO:6, amino acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7. 

31. An isolated nucleic acid molecule according to claim 30, wherein said p- 
ketoacyl-synthase domain comprises an amino acid sequence selected from the group 
consisting of : amino acids 1 1-437 of SEQ ID NO:2. amino acids 7-432 of SEQ ID NO:4, 
amino acids 39-457 of SEQ ID NO:5, amino acids 1 524-1 950 of SEQ ID NO:5, amino acids 
3024-3449 of SEQ ID NO:5 t amino acids 5103-5525 of SEQ ID NO:5, amino acids 35-454 
of SEQ ID NO:6, amino acids 15:22-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ 
ID NO:7. 

32. An isolated nucleic acid molecule according to claim 30, wherein said nucleotide 
sequence is substantially similar to a nucleotide sequence selected from the group 
consisting of: nucleotides 7643-8920 of SEQ ID NO:!, nucleotides 16269-17546 of SEQ ID 
NO:1, nucleotides 21860-231-16 of SEQ ID NO: 1, nucleotides 26318-27595 of SEQ ID 
NO:1, nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID 
NO:1, nucleotides 43626-44885 of SEQ ID NOM, nucleotides 48087-49361 of SEQ ID 
NO:1, and nucleotides 55028-56284 of SEQ ID NO:1. 
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33. An isolated nucleic acid molecule according to claim 30, wherein said nucleotide 
sequence comprises a consecutive 20 base pair nucleotide portion identical in sequence to 
a consecutive 20 base pair portion of a nucleotide sequence selected from the group 
consisting of: nucleotides 7643-8920 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID 

NO:1 , nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID 
NO:1 , nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID 
NO:1 , nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID 
NO:1, and nucleotides 55028-56284 of SEQ ID NO:1. 

34. An isolated nucleic acid molecule according to claim 30, wherein said nucleotide 
sequence is selected from the group consisting of: nucleotides 7643-8920 of SEQ ID NO:l , 
nucleotides 16269-1 7546 of SEQ ID NO: 1, nucleotides 21860-23116 of SEQ ID NO:1, 
nucleotides 26318-27595 of SEQ ID NO:1. nucleotides 30815-32092 of SEQ ID NO:1, 
nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 43626-44885 of SEQ ID NO:1, 
nucleotides 48087-49361 of SEQ ID NOM, and nucleotides 55028-56284 of SEQ ID NO:i. 

35. An isolated nucleic acid molecule according to claim 29, wherein said epothilone 
synthase domain is a an acyltransferase domain comprising an amino acid sequence 
substantially similar to an amino acid sequence selected from the group consisting of: 
amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, amino acids 
563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 3555-3876 
of SEQ ID NO:5, amino acids 5631 -5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID 
NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556-877 of SEQ ID NO:7. 

36. An isolated nucleic acid molecule according to claim 35, wherein said 
acyltransferase domain comprises an amino acid sequence selected from the group 
consisting of: amino acids 543-864 of SEQ ID NO:2, amino adds 539-859 of SEQ ID NO:4, 
amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino 
acids 3555-3876 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids 
561 -881 of SEQ ID NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556- 
877 of SEQ ID NO:7. 
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37. An isolated nucleic acid molecule according to claim 35, wherein said nucleotide 
sequence is substantially similar to a nucleotide sequence selected from the group 
consisting of: nucleotides 9236-10201 of SEQ ID NO:1. nucleotides 17865-18827 of SEQ 
ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID 
NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID 
NO:1 , nucleotides 45204-46166 of SEQ ID NO:1 . nucleotides 49680-50642 of SEQ ID 

. NO:1, and nucleotides 56600-57565 of SEQ ID NO:1. 

38. An isolated nucleic acid molecule according to claim 35, wherein said nucleotide 
sequence comprises a consecutive 20 base pair nucleotide portion identical in sequence to 
a consecutive 20 base pair portion of a nucleotide sequence selected from the group 
consisting of: nucleotides 9236-10201 of SEQ ID NO:1 t nucleotides 17865-18827 of SEQ 
ID NO:1, nucleotides 23431 -24397 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID 
NO:1 , nucleotides 32408-33373 of SEQ ID NO:1 , nucleotides 38636-39598 of SEQ ID 
NO:1, nucleotides 45204-46166 of SEQ ID NO:l, nucleotides 49680-50642 of SEQ ID 
NO:1, and nucleotides 56600-57565 of SEQ ID NO: 1. 

39. An isolated nucleic acid molecule according to claim 35, wherein said nucleotide 
sequence is selected from the group consisting of: nucleotides 9236-10201 of SEQ ID 
NO:1, nucleotides 17865-18827 of SEQ ID NO:i, nucleotides 23431-24397 of SEQ ID 
NO:1 , nucleotides 27911-28876 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID 
NO:1, nucleotides 38636-39598 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID 
NO:1, nucleotides 49680-50642 of SEQ ID NO:1, and nucleotides 56600-57565 of SEQ ID 
NO:1. . . 

40. An isolated nucleic acid molecule according to claim 29, wherein said epothilone 
synthase domain is an enoyl reductase domain comprising an amino acid sequence 
substantially similar to an amino acid sequence selected from the group consisting of: 
amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino 
acids 6542-6837 of SEQ ID NO:5, and amino acids 1478-1790 of SEQ ID NO:7. 

41. An isolated nucleic acid molecule according to claim 40, wherein said enoyl 
reductase domain comprises an amino acid sequence selected from the group consisting 
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of: amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino 
acids 6542-6837 of SEQ ID NO:5, and amino acids 1478-1790 of SEQ ID NO:7. 

42. An isolated nucleic acid molecule according to claim 40, wherein said nucleotide 
sequence is substantially similar to a nucleotide sequence selected from the group 
consisting of: nucleotides 10529-11428 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ 
ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, and nucleotides 59366-60304 of SEQ 
ID NO:1. 

43. An isolated nucleic acid molecule according to claim 40, wherein said nucleotide 
sequence comprises a consecutive 20 base pair nucleotide portion identical in sequence to 
a consecutive 20 base pair portion of a nucleotide sequence selected from the group 
consisting of: nucleotides 10529-1 1428 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ 
ID NO:1, nucleotides 41369-42256 of SEQ ID NO: 1, and nucleotides 59366-60304 of SEQ 
ID NO:1. 

44. Ah isolated nucleic acid molecule according to claim 40, wherein said nucleotide 
sequence is selected from the group consisting of: nucleotides 10529-1 1428 of SEQ ID 
NO:1 , nucleotides 35042-35902 of SEQ ID NO: 1 /nucleotides 41369-42256 of SEQ ID 
NO:1, and nucleotides 59366-60304 of SEQ ID NO: 1. 

45. An isolated nucleic acid molecule according to claim 29, wherein said epothilone 
synthase domain is an acyl carrier protein domain comprising an amino acid sequence 
substantially similar to an amino acid sequence selected from the group consisting of: 
amino acids 1 31 4-1 385 of SEQ ID NO:2 f amino acids 1 722-1 792 of SEQ ID NO:4 t amino 
acids 1434-1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 
5010-5082 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5 t amino acids 1430- 
1503 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093- 
2164 of SEQ ID NO:7. 

46. An isolated nucleic acid molecule according to claim 45, wherein said acyl 
earner protein domain comprises an amino acid sequence selected from the group 
consisting of: amino acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID 
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NO:4, amino acids 1 434-1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, 
amino acids 5010-5082 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, amino 
adds 1430-1503 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino 
acids 2093-2164 of SEQ ID NO:7. 

47. An isolated nucleic acid molecule according to claim 45, wherein said nucleotide 
sequence is substantially similar to a nucleotide sequence selected from the group 
consisting of: nucleotides 11549-11764 of SEQ ID NO: 1, nucleotides 21414-21626 of SEQ 
ID NO:1, nucleotides 26045-26263 of SEQ ID NO: 1, nucleotides 30539-30759 of SEQ ID 
NO:1, nucleotides 36773-36991 of SEQ ID NO:1 t nucleotides 43163-43378 of SEQ ID 
NO:1. nucleotides 4781 1r48032 of SEQ ID NO:1 , nucleotides 54540-54758 of SEQ ID 
NO:1 t and nucleotides 61211-61426 of SEQ ID NO:1. 

48. An isolated nucleic acid molecule according to claim 45, wherein said nucleotide 
sequence comprises a consecutive 20 base pair nucleotide portion identical in sequence to 
a consecutive 20 base pair portion of a nucleotide sequence selected from the group 
consisting of: nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ 
ID NO:t, nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID 
NO:1, nucleotides 36773-36991 of SEQ ID NO: 1, nucleotides 43163-43378 of SEQ ID 
NO:1 . nucleotides 4781 1-48032 of SEQ ID NO:1 , nucleotides 54540-54758 of SEQ ID 
NO:1, and nucleotides 61211-61426 of SEQ ID NO:1. 

49. An isolated nucleic acid molecule according to claim 45, wherein said nucleotide 
sequence is selected from the group consisting of: nucleotides 1 1 549-1 1 764 of SEQ ID 
NO:1, nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID 
NO:1, nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID 
NO:1 , nucleotides 431 63-43378 of SEQ ID NO:1 , nucleotides 4781 1 -48032 of SEQ ID 
NO:1, nucleotides 54540-54758 of SEQ ID NO:1, and nucleotides 61211-61426 of SEQ ID 
NO:1. 

50. An isolated nucleic acid molecule according to claim 29, wherein said epothilone 
synthase domain is a dehydratase domain comprising an amino acid sequence 
substantially similar to ah amino acid sequence selected from the group consisting of: 
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amino acids 869-1037 of SEQ ID NO:4, amino acids 3886-4048 of SEQ ID NO:5, amino 
acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID NO:6, and amino 
acids 887-1051 of SEQ ID NO:7. 

51. An isolated nucleic acid molecule according to claim 50, wherein said 
dehydratase domain comprises an amino acid sequence selected from the group consisting 
of: amino acids 869-1037 of SEQ ID NO:4, amino acids 3886-4048 of SEQ ID NO:5, amino 
acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID NO:6, and amino 
acids 887-1051 of SEQ ID NO:7. 

52. An isolated nucleic acid molecule according to claim 50, wherein said nucleotide 
sequence is substantially similar to a nucleotide sequence selected from the group 
consisting of: nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 33401-33889 of SEQ 
ID NO:1, nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID 
NO:1 , and nucleotides 57593-58087 of SEQ ID NO:1 . 

53. An isolated nucleic acid molecule according to claim 50, wherein said nucleotide 
sequence comprises a consecutive 20 base pair nucleotide portion identical in sequence to 
a consecutive 20 base pair portion of a nucleotide sequence selected from the group 
consisting of: nucleotides 18855-19361 of SEQ ID NO:1 , nucleotides 33401-33889 of SEQ 
ID NO:1. nucleotides 39635-40141 of SEQ ID NO: 1, nucleotides 50670-51176 of SEQ ID 
NO:1, and nucleotides 57593-58087 of SEQ ID NO:1. 

54. An isolated nucleic acid molecule according to claim 50, wherein said nucleotide 
sequence is selected from the group consisting of: nucleotides 18855-19361 of SEQ ID 
NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID 
NO:1 . nucleotides 50670-51 176 of SEQ ID NO:i t and nucleotides 57593-58087 of SEQ ID 
NO:1. 

55. An isolated nucleic acid molecule according to claim 29, wherein said epothilone 
synthase domain is a P-ketdreductase domain comprising an amino acid sequence 
substantially similar to an amino acid sequence selected from the group consisting of: 
amino acids 1439-1684 of SEQ ID NO:4, amino acids 1 147-1399 of SEQ ID NO:5, amino 
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acids 2645-2895 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino, acids' 
6857-7101 of SEQ ID NO:5, amino acids 1 143-1393 of SEQ ID NO:6, amino acids 3392- 
3636 of SEQ ID NO:6, and amino acids 1810-2055 of SEQ ID NO:7. 

56. An isolated nucleic acid molecule according to claim 55, wherein said p- 
ketoreductase domain comprises an amino acid sequence selected from the group 
consisting of: amino acids 1439-1684 of SEQ ID NO:4, amino acids 1 147-1399 of SEQ ID 
NO:5, amino acids 2645-2895 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, 
amino acids 6857-7101 of SEQ ID NO:5, amino acids 1143-1393 of SEQ ID NO:6, amino 
acids 3392-3636 of SEQ ID NO:6, and amino acids 1810-2055 of SEQ ID NO:7. 

57. An isolated nucleic acid molecule according to claim 55, wherein said nucleotide 
sequence is substantially similar to a nucleotide sequence selected from the group 
consisting of: nucleotides 20565-21302 of SEQ ID NO:1 , nucleotides 25184-25942 of SEQ 
ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 35930-36667 of SEQ ID 
NO:1 , nucleotides 4231 4-43048 of SEQ ID NO:1 , nucleotides 46950-47702 of SEQ ID 
NO:1 , nucleotides 53697-54431 of SEQ ID NO:1 , and nucleotides 60362-61099 of SEQ ID ' 
NO:1. 

58. An isolated nucleic acid molecule according to claim 55, wherein said nucleotide 
sequence comprises a consecutive 20 base pair nucleotide portion identical in sequence to 
a consecutive 20 base pair portion of a nucleotide sequence selected from the group 
consisting of: nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ 
ID NO:t, nucleotides 29678-30429 of SEQ ID NO:1 , nucleotides 35930-36667 of SEQ ID 
NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID 
NO:1, nucleotides 53697-54431 of SEQ ID NO:1, and nucleotides 60362-61099 of SEQ ID 
NO:1. 

59. An isolated nucleic add molecule according to claim 55, wherein said nucleotide 
sequence is selected from the group consisting of: nucleotides 20565-21 302 of SEQ ID 
NO:i, nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID 
NO:1, nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID 
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NO:1, nucleotides 46950-47702 of SEQ ID NO:1. nucleotides 53697-54431 of SEQ ID 
NO:1, and nucleotides 60362-61099 of SEQ ID N0:1. 

60. An isolated nucleic acid molecule according to claim 29, wherein said epothilone 
synthase domain is a methyitransferase domain comprising an amino acid sequence 
substantially similar to amino acids 2671-3045 of SEQ ID NO:6. 

61. An Isolated nucleic acid molecule according to claim 60. wherein said 
methyitransferase domain comprises amino acids 2671 -3045 of SEQ ID NO:6. 

62. An isolated nucleic acid molecule according to claim 60, wherein said nucleotide 
sequence is substantially similar to nucleotides 51534-52657 of SEQ ID NO:t. 

63. An isolated nucleic acid molecule according to claim 60. wherein said nucleotide 
sequence comprises a consecutive 20 base pair nucleotide portion identical in sequence to 
a consecutive 20 base pair portion of nucleotides 51534-52657 of SEQ ID N0:1. 

64. An isolated nucleic acid molecule according to claim 60, wherein said nucleotide 
sequence is nucleotides 51 534-52657 of SEQ ID NO:1 . 

65. An isolated nucleic acid molecule according to claim 29, wherein said epothilone 
synthase domain is a thioesterase domain comprising an amino acid sequence substantially 
similar to amino adds 2165-2439 of SEQ ID N0.7. 

66. An isolated nucleic acid molecule according to claim 65, wherein said 
thioesterase domain comprises amino acids 2165-2439 of SEQ ID NO:7. 

67. An isolated nucleic acid molecule according to claim 65, wherein said nucleotide 
sequence is substantially similar to nucleotides 61427-62254 of SEQ ID NO:1. 

68. An isolated nucleic acid molecule according to claim 65, wherein said nucleotide 
sequence comprises a consecutive 20 base pair nucleotide portion identical in sequence to 
a consecutive 20 base pair portion of nucleotides 61427-62254 of SEQ ID NO:1 . 
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69. An isolated nucleic acid molecule according to claim 65, wherein said nucleotide 
sequence is nucleotides 61427-62254 of SEQ ID NO:1. 

70. An isolated nucleic acid molecule comprising a. nucleotide sequence that 
encodes a non-ribosomal peptide synthetase, wherein said non-ribosomal peptide 
synthetase comprises an amino acid sequence substantially similar to an amino acid 
sequence selected from the group consisting of: SEQ ID NO:3, amino acids 72-81 of SEQ 
ID NO:3, amino acids 118-125 of SEQ ID NO:3. amino acids 199-212 of SEQ ID NO:3, 
amino acids 353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 
588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of 
SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3. amino acids 903-912 of SEQ ID NO:3, 
amino acids 918-940 of SEQ ID NO:3 t amino acids 1268-1274 of SEQ ID NO:3, amino 
acids 1285-1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, and amino acids 
1344-1351 of SEQ ID NO:3. 

71. An isolated nucleic acid molecule according to claim 70, wherein said non- 
ribosomal peptide synthetase comprises an amino acid sequence selected from the group 
consisting of: SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3 f amino acids 1 18-125 of 
SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, 
amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NQ:3, amino acids 
669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3. amino acids 868-892 of 
SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, 
amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3, amino 
acids 973-1 256 of SEQ ID NO:3, and amino acids 1 344-1 351 of SEQ ID NO:3. 

72. An isolated nucleic acid molecule according to claim 70, wherein said nucleotide 
sequence is substantially similar to a nucleotide sequence selected from the group 
consisting of: nucleotides 11872-16104 of SEQ ID NO:1 ( nucleotides 12085-12114 of SEQ 
ID NO:1. nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID 
NO:1, nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID 
NO:1, nucleotides 13633-13680 of SEQ ID NO: 1, nucleotides 13876-13923 of SEQ ID 
NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID 
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NOM. nucleotides 14578-14607 of SEQ ID NO:!, nucleotides 14623-14692 of SEQ ID 
NO:1, nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID 
NO:1, nucleotides 14788-15639 of SEQ ID NO:1, and nucleotides 15901-15924 of SEQ ID 



NO:1. 



73. An isolated nucleic acid molecule according to claim 70, wherein said nucleotide 
sequence comprises a consecutive 20 base pair nucleotide portion identical in sequence to 
a consecutive 20 base pair portion of a nucleotide sequence selected from the group 
consisting of: nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085ll21 14 of SEQ 
ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID 
NO:1 , nucleotides 12928-1 2960 of SEQ ID NO:l . nucleotides 13516-1 3566 of SEQ ID 
NO:1 . nucleotides 13633-13680 of SEQ ID NO:1 , nucleotides 13876-13923 of SEQ ID 
NO:1, nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-1 4547 of SEQ ID 
NO:1, nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID 
NO:1. nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID 
NO:1, nucleotides 14788-15639 of SEQ ID NO:1, and nucleotides 15901-15924 of SEQ ID 
NO:1. 

74. An isolated nucleic acid molecule according to claim 70. wherein said nucleotide 
sequence is selected from the group consisting of: nucleotides 1 1872-16104 of SEQ ID 

N0:1, nucleotides 12085-121 14 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID 
NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID 
NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID 
NO:1 , nucleotides 1 3876-1 3923 of SEQ ID NO:1 , nucleotides 1 431 3-1 4334 of SEQ ID 
NO:1, nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID 
NO:1, nucleotides 14623-14692 of SEQ ID NQ:1, nucleotides 15673-15693 of SEQ ID 
NO:1, nucleotides 15724-15762 of SEQ ID NO:1. nucleotides 14788-15639 of SEQ ID 
NO:1, and nucleotides 15901-15924 of SEQ ID NO:1. 

75. A method for heterologous expression of epothilone in a recombinant host, 
comprising: 

(a) introducing a chimeric gene according to claim 4 into a host; and 

(b) growing the host in conditions that allow biosynthesis of epothilone in the host. 
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76. A method for producing epothilone, comprising: 

(a) expressing epothilone in a recombinant host by the method of claim 75; and 

(b) extracting epothilone from the recombinant host. 

77. An isolated polypeptide comprising an amino acid sequence that consists of an 
epothilone synthase domain. 

78. An isolated polypeptide according to claim 77, wherein said epothilone synthase 
domain is a j}-ketoacyl-synthase domain comprising an amino acid sequence substantially 
similar to an amino acid sequence selected from the group consisting of: amino adds 1 1- 
437 of SEQ ID NO:2, amino acids 7-432 of SEQ ID NO:4, amino acids 39-457 of SEQ ID 
NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, 
amino acids 5103-5525 of SEQ ID NO:5, amino acids 35-454 of SEQ ID NO:6 t amino acids 
1522-1946 of SEQ ID NO: 6, arid amino adds 32-450 of SEQ ID NO:7. 

79. An isolated polypeptide according to claim 78, wherein said p-ketoacyl-synthase 
domain comprises an amino acid sequence selected from the group consisting of: amino 
acids 11-437 of SEQ ID NO:2 t amino acids 7-432 of SEQ ID NO:4, amino acids 39-457 of 
SEQ ID NO:5. amino acids 1524-1950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID 
NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 35-454 of SEQ ID NO:6, amino 
acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7. 

80. An isolated polypeptide according to claim 77, wherein said epothilone synthase 
domain is an acyltransf erase domain comprising an amino acid sequence substantially 
similar to an amino acid sequence selected from the group consisting of: amino acids 543- 
864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ 
ID NO:5, amino acids 2056-2377 of SEQ ID NO:5 ( amino acids 3555-3876 of SEQ |D NO:5, 
amino adds 5631-5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID NO:6, amino 
acids 2053-2373 of SEQ ID NO;6, and amino acids 556-877 of SEQ ID NO:7. 

81 . An isolated polypeptide according to claim 80, wherein said acyltransferase 
domain comprises an amino acid sequence selected from the group consisting of: amino 
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acids 543-864 of SEQ ID NO:2 f amino acids 539-859 of SEQ ID NO:4, amino acids 563- 
884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 3555-3876 of 
SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID 
NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556-877 of SEQ ID NO:7. 

82. An isolated polypeptide according to claim 77, wherein said epothilone synthase 
domain is an enoyl reductase domain comprising an amino acid sequence substantially 
similar to an amino acid sequence selected from the group consisting of: amino acids 974- 
1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino acids 6542-6837 of 
SEQ ID NO:5, and amino acids 1478-1790 of SEQ ID NO:7. 

83. An isolated polypeptide according to claim 82, wherein said enoyl reductase 
domain comprises an amino acid sequence selected from the group consisting of: amino 
acids 974-1273 of SEQ ID NO:2, amino acids 4433-471 9 of SEQ ID NO:5, amino acids 
6542-6837 of SEQ ID NO:5, and amino acids 1478-1790 of SEQ ID NO:7. 

84. An isolated polypeptide according to claim 77, wherein said epothilone synthase 
domain is an acyl carrier protein domain, wherein said polypeptide comprises an amino acid 
sequence substantially similar to an amino acid sequence selected from the group 
consisting of: amino acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID 
NO:4, amino acids 1434-1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, 
amino acids 5010-5082 of SEQ ID NO:5, amino acids 7140-721 1 of SEQ ID NO:5, amino 
acids 1430-1503 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino 
acids 2093-2164 of SEQ ID NO:7. 

85. An isolated polypeptide according to claim 84, wherein said acyl carrier protein 
domain comprises an amino acid sequence selected from the group consisting of: amino 
acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 
1434-1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010- 
5082 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, amino adds 1430-1503 of 
SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of 
SEQ ID NO:7. 
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86. An isolated polypeptide according to claim 77, wherein said epothilone synthase 
domain is a dehydratase domain comprising an amino acid sequence substantially similar to 
an amino acid sequence selected from the group consisting of: amino acids 869-1037 of 
SEQ ID NO:4i amino acids 3886-4048 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID 
NO:5, amino acids 2383-2551 of SEQ ID NO:6, and amino acids 887-1051 of SEQ ID NO:7. 

87. An isolated polypeptide according to claim 86, wherein said dehydratase 
domain comprises an amino acid sequence selected from the group consisting of: amino 
acids 869-1037. of SEQ ID NO:4, amino acids 3886-4048 of SEQ ID NO:5, amino acids 
5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID NO:6, and amino acids 887- 
1051 of SEQ ID NO:7. 

88. An isolated polypeptide according to claim 77, wherein said epothilone synthase 
domain is a (3-ketoreductase domain comprising an amino acid sequence substantially 
similar to an amino acid sequence selected from the group consisting of : amino acids 1439- 
1 684 of SEQ ID NO:4, amino acids 1 147-1 399 of SEQ ID NO:5, amino acids 2645-2895 of 
SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID/ 
NO:5, amino acids 1 143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, 
and amino acids 1810-2055 of SEQ ID NO:7. 

89. An isolated polypeptide according to claim 88, wherein said JJ-ketoreductase 
domain comprises an amino acid sequence selected from the group consisting of: amino 
acids 1439-1684 of SEQ ID NO:4, amino acids 1 147-1399 of SEQ ID NO:5, amino acids 
2645-2895 of SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857- 
7101 of SEQ ID NQ:5, amino acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of 
SEQ ID NQ:6, and amino acids 1810-2055 of SEQ ID NO:7. 

90- An isolated polypeptide according to claim 77, wherein said epothilone synthase 
domain is a methyltransf erase domain comprising an amino acid sequence substantially 
similar to amino acids 2671-3045 of SEQ ID NO:6. 

91 . An isolated polypeptide according to claim 90, wherein said methyltransf erase 
domain comprises amino acids 2671-3045 of SEQ ID NO:6. 
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92. An isolated polypeptide according to claim 77, wherein said epothilone synthase 
domain is a thioesterase domain comprising an amino acid sequence substantially similar to 
amino adds 21 65-2439 of SEQ ID NO:7. 

93. An isolated polypeptide according to claim 77, wherein said thioesterase domain 
comprises amino acids 2165-2439 of SEQ ID NO:7. 
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SEQUENCE LISTING 

<110> Novartis AG 

<12.0> GENES FOR THE BIOSYNTHESIS OF EPOTHILONES 

<130> 4-30582A 

<140> 
<141> 

<160> 30 

<170> Patentln Ver. 2.0 

<210> 1 . 
<211> 68750 
<212> DNA 

<213> Sorangium cellulosum 



<400> 1 

aagcttcgct 

acggccgggc 

ccctccgaga 

gcgacctgac 

acgcccgcag 

cgcccgtccg 

agcgcgagcg 

cccgcctcgc 

cgatgtcgcc 

agcccgcctg 

accccgccta 

tcgccgcggc 

tcgcctcctg 

tgcgggagcg 

tcgagcgcgc 

tcgcggcggg 

tcgcgaccga 

ccggccggac 

cgccccccag 

tgacggaagg 

tccacgctcg 

gcgcgccctt 

gtctccacga 

ggtgcgagcc 

ccccgagccg 

tcaccctcaa 

tcgacaagcg. 

acgtcgtgtc 

ggcacgtcga 

tatgggaccg 

gcatcctgac 

ccggaccgag 

gcagacgctg 

caccaegacc 

gtcccaccca 

cccataaccc 

ggccggcgtg 

gccctcgtgc 

cgacacgagc 

cccgtcgcga 

cgcgaaccga 

cgcgaggatg 

caccccctgc 

cggcagcgcc 



cgacgccctc 
cacggagcgg 
gcacctccgc 
ggtggagccc 
cccgaggcac 
gcggctcgcc 
agcccgaacc 
gcccgatctg 
cgaagtcgcc 
tgccgcgccg 
cgagatgctg 
ctccgcgccc 
ggaggtcgta 
gctccggacg 
cgaggcgatc 
cgccggcccg 
cggagacgca 
cccgccggtc 
ccagatgctc 
cagccccctc 
cgggttcacg 
cgtcgcccag 
gcccgccgrgc 
rucggccggc 
crtcgcgcac 
tgcgacccac 
caccggcgta 
cgaggaccgg 
gcacatccgc 
ccctgacatg 
gctcgcccgc 
ccggcgaccc 
cgccagaaac 
ccgatcagct 
tagcgcccgg 
cagccgtgcg 
aggtgatcgg 
acaccgccgt 
ccggcggcgc 
tccatgcgcg 
-gcgagcctgc 
cgcccgcgca 
accaggctcc 
ccgagccgcc 



ctcgcccgcg 
catgcgcccg 
atccaggaag 
cccgcgcacg 
cccgactgga 
gcgcgcggcg 
gcgcaggagg 
ccccgccccg 
gaggccgagc 
ctcgcccggc 
ccagagaacc 
ggcacatcgg 
tcgagcaaga 
atcgcccgcg 
gcggcggagg 
gcggtcgccg 
ctgtactccg 
gtgccgcccg 
tccgccgcgc 
atcgtgatgg 
gcgtgggtca 
cgctcgacca 
agcgctttct 
cggctcgagc 
ctcggcgagc 
gngccgcggg 
gagcccatcg 
gacatcttcg 
cccggcgcct 
gtgctcaatc 
agctgacacc 
gccgctggcg 
gctcgagagc 
cgcggcccgg 
gcacctccga 
tcncgaagaa 
ccgtcatctc 
tcatgagcat 
ccgaagggaa 
tcctcccccc 
ccgccgggac 
cgaacgcatc 
ccgtgttgca 
ggacccaccc 



ccacctc-gc 

ccgaggcgcg 

gggggccgcc 

accagccccc 

ccccggacgc 

cgccgggtcc 

cgaggcgccc 

aggacgacgc 

ggcgcccccg 

tcgggacggg 

tgccccccgg 

aggccgctct 

agagccagct 

cgacgggcaa 

tgcgccgccc 

gggtcccccc 

gcgacggcaa 

ccggaaccga 

acgccaacgc 

caagaaacca 

accaggccac 

tcacggaact 

ccctcgcccg 

catggcgcca 

accccatcgc 

ccgacccnga 

tccccgcgga 

cgcttaccgg 

ccaccgccgc 

ggcgcggcct 

gctcgacgcc 

ggccgcagct 

ccccgagaac 

ancattgatc 

gaccgcgtcc 

gcccgggaaa 

gcgcaccgag 

gcgcgcgaac 

cgccggcggg 

cgtctgcccg 

gtgaaagtcg 

gaagccctgg 

cacccactgc 

gtctggcccg 



ccgcgcgccc 

cgggatcgag 

cr^tcactgc 

cgcgcccatc 

gacgctcgcc 

cccccgcgag 

g^ggctcgcg 

caacgggccg 

cgcctcgtac 

cgcgggtccc 

gtttggcctc 

ccgcggcgca 

cggcaacatc 

tgccgacaac 

gcgcgcacag 

gagcggccgg 

cgacatcguc 

tcccctcctc 

gggcaccacc 

ggcgcgaccg 

ggcgcccgac 

cgagcacccc 

cgacgaggag 

cccgcaccac 

ggcgacctgg 

ccgcagggcc 

gacgcgccat 

acagcccgac 

ggccgaccac 

cttcttcacg 

gggccgctca 

catgccgatt 

aggaagccgg 

caggacgtcc 

ggcgccgtga 

aacgaggacg 

gcggcgctca 

aggaggtagt 

ctggcatagg 

tcggtgaagt 

gtgtcccgca 

ccggccgcgc 

gcccccggct 

tgcggcgcca 



gacgatggcc 

gacccccgcg 

acgcgcctcg 

agcttccacc 

gacggccccg 

cacgaagagg 

gccgcgccgc 

ccgctcggcc 

gcgactcctg 

tggtccggac 

ccgaccgcga 

gcgcggctgt 

cccgaagccc 

ccctctcgcc 

ccggcgccct 

ctctcgggcc 

acgtcccaac 

gagcccgcac 

tccaaggccc 

atigagcctcg 

cccgagcggg 

acgcctcgct 

cacccccacc 

cgccccggcg 

cacccctcgc 

atcctcgggg 

cccccggcgc 

tcccgcgact 

cagcgccagc 

acgaacgacc 

ccgagggcgc 

cggtggcgac 

cggattgtgc 

cgaacccgcc 

ccacggccat 

ccgccttctg 

agagccgccg 

cgtccaccgt 

cgcccccggc 

aaccgcagcc 

tcccgagcgg 

gccccacgag 

gacgcatgag 

tcggcaccgg 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500: 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 
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ctgcgcgttg acgagcccga gctcgtcgat ggcccgetgg atcggcgacg atgcgtcgaa 2700 
cgagattccg aagcccatcg tgaacgtcat caggtcgcgc accgtgatcg gccactccgc 2760 
gggcaccgtc tcgtcgatcg gaccatcgat gcgcgccagc accttccggt tcgcgagccc 2820 
cggcaaccat cggtcgacgg gggagccgag gtcgagcttg ccttcctcga cgagcatcat 2880 
caccgccgtc gcggtgaccg ccttcgccac cgaggcgacc cggaagatcg tgtcccgccg 2940 
catgggcgcg ctgccgccga gctcggtcac gcccaccgcg tccacgtgca cgccgtcgcc 3000 
gcgcgcgacc agccagaccg ctcccggcat ctgccccgcc gccacctccg ccgccaccac 3060 
cccgcgcgcg ggcgccagcg cgccggcccc cgcgtcctgc cctggccgcc cctcctcccc 3120 
ggccccaccc aacgcgcacc ccggcgccgc cacgctgatc aaagctccca taaactcccg 3180 
ccttctcatg accgtcgatg cctctccgag cgggggcgcc tgcccctgcc gagagcaccg 3240 
actgcccgcg cccgaaaaaa tcatcggtgc cccgtcacga tcgccgccgg gcgcggctcc 3300 
gcccggccgc ccgctcgggc gcccgcccct ggacgagcaa agctcgcccg cccgcgctca 3360 
gcacgccgct tgccatgtcc ggcctgcacc cacaccgagg agccacccac cctgatgcac 3420 
ggcctcaccg agcggcaggt cctgctctcg ctcgccaccc tcgcgctcat cctcgtgacc 3480 
gcgcgcgcct ccggcgagct cgcgcggcgg ctgcgccagc ccgaggtgct cggggagccc 3540 
tccggcggcg tcgtgctggg cccctccgtc gtcggcgcgc tcgcgcccgg gttccatcga 3600 
gccctctccc aggagccggc ggtcggggcc gtgctctcgg gcatctcctg gataggcgcg 3660 
ctcctcctgc tgctgatggc gggcatcgag gtcgacgtgg gcatcctgcg caaggaggcg 3720 
cgccccgggg cgctctcggc gctcggcgcg atcgcgcccc cgctcgcggc aggcgccgcc 3780 
ttctcggcgc ccgtgctcga tcggcccctt ccgagcggcc tcttcctcgg gatcgtgctc 3840 
ccggtgacgg cggtcagcgt gatcgcgaag gtgctgatcg agcgcgagtc gatgcgccgc 3900 
agctatgcgc aggtgacgct cgcggcgggg gtggtcagcg aggtcgctgc ctgggtgctc 3960 
gtcgcgatga cgtcgtcgag ctacggcgcg tcgcccgcgc tggcggtcgc ccggagcgcg 4020 
ctcctggcga gcggattctt gctgttcatg gtgctcgtcg ggcggcggcc cacccacctc 4080 
gcgatgcgct gggtggccga cgcgacgcgc gtctccaagg gacaggtgtc gctcgtcctc 4140 
gtcctcacgt tcctggccgc ggcgctgacg cagcggctcg gcctgcaccc gctgctcggc 4200 
gcgttcgcgc tcggcgtgct gctcaacagc gctcctcgca ccaaccgccc tctcctcgac 4260 
. ggcgtgcaga cgctcgtggc gggcctcttc gcgcctgtgt tcttcgtccc cgcgggcatg 4320 
cgcgtcgacg tgtcgcagct gcgcacgccg gcggcgtggg ggacggtcgc gttgctgccg 4380 
gcgaccgcga cggcggcgaa ggtcgtcccc gccgcgctcg gcgcgcggct cagcgggctc 4440 
aggggcagcg aggcggcgct cgtggcggtg ggcctgaaca tgaagggcgg cacggacctc 4500 
atcgtcgcga tcgtcggcgt cgagctcggg ctcctctcca acgaggctta tacgatgtac 4560 
gccgtcgccg cgctggtcac ggtgaccgcc tcacccgcgc tcctcatctg gctcgagaaa 4620 
agggcgcctc cgacgcagga ggagtcggct cgcctcgagc gcgaggaggc cgcgaggcgc 4680 
gcgcacaccc ccggggtcga gcggatcctc gtcccgatcg tggcgcacgc cctgcccggg 4740 
ttcgccacgg acatcgtgga gagcatcgtc gcctccaagc gaaagctcgg cgagacggtc 4800 
gacatcacgg agctctccgt ggagcagcag gcgcccggcc catcgcgcgc cgcgogggag 4860 
gcgagccggg ggctcgcgag gctcggcgcg cgcctccgcg tcggcatctg gcggcaaagg 4920 
cgcgagctgc gcggctcgat ccaggcgatc ctgcgcgcct cgcgggatca caatctactc 4980 
gtgatcggcg cgcgatcgcc ggcgcgcgcg cgcggaatgt cgttcggtcg cccgcaggac 5040 
gcgatcgtcc agcgggccga gtccaacgtg ctcgtcgtgg tgggcgaccc tccggcggcg 5100 
gagcgcgcct ccgcgcggcg gaccctcgtc ccgatcatcg gcctcgagta ctccttcgcc 5160 
gccgccgatc tcgcggccca cgcggcgctg gcgtgggacg ccgagctcgt gctgctcagc 5220 
agcgcgcaga ccgatccggg cgcggtcgtc tggcgcgatc gcgagccatc ccgggtgcgc 5280 
gcggtggcgc ggagcgtcgt cgacgaggcg gtcttccggg ggcgccggct cggcgcgcgc 5340 
gtctcgtcgc gcgtgcacgt gggcgcgcac ccgagcgacg agataacgcg ggagctcgcg 5400 
cgcgccccgt acgatctgct cgtgctcgga tgctacgacc atgggccgct cggccggctc 5460 
tacctcggca gcacggtcga gtcggtggtg gtccggagcc gggtgccggt cgcgttgctc 5520 
gtcgcgcatg gagggactcg agagcaggtg aggtgaggct tccaccgcgc tcgcccgtga 5580 
ggaagcgagc gcccggctct gccgacgatc gtcactcccg gtccgtgtag gcgatcgtgc 5640 
tgagcagcgc gttctccgcc tgacgcgagt cgagccgggt atgctgcacg accatggggg 5700 
cgtccgattc gatcacgctg gcatagtccg tatcgcgcgg gatcggctcg ggctcggtca 5760 
gatcgttgaa ccggacgtgc cgggtgcgcc tcgctggaac ggtcacccgg taaggcccgg 5820 
cggggtcgcg gtcgctgaag taaacggtga tggcgacctg cgcgtcccgg tccgacgcat 5880 
tcaacaggca ggccgtctca tggctcgtca tctgcggctc aggtccgttg ctcccgcctg 5940 
ggatgtagcc ctctgcgatt gcacagcgcg tccgcccgat cggcttgtcc atgtgtcctc 6000 
cctcctggct cctctttggc agcctccctc tgctgtccag gagcgacggc ctcttcgctc 6060 
gacgcgctcg gggatccatg gctgaggatc ctcgccgagc gctccctgcc gaccggcocg 6120 
ccgagcgccg acgggccttg aaagcgcgcg accggccagc ccggacgcgg gcccgagagg 6180 
gacagtgggt ccgccgtgaa gcagagaggc gatcgaggtg gtgagatgaa acacgtcgac 6240 
acgggccgac gattcggccg ccggataggg cacacgctcg gtcttctcgc gagcacggcg 6300 
ctcgccggct gcggcggtcc gagcgagaaa accgtgcagg gcacgcggct cgcgcccggc 6360 
gccgatgcgc gcgtcaccgc cgacgtcgac cccgacgccg cgaccacgcg gctggcggtg 6420 
gacgtcgttc acctctcgcc gcccgagcgg ctcgaggccg gcagcgagcg gttcgtcgtc 6480 
cggcagcgtc cgagccccga gtccccgcgg cgacgggtcg gagtgctcga ctacaatgct 6540 
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gacagccgaa gaggcaagct ggccgagacg accgtgccgt acgccaactt cgagctgccc 6600 
atcaccgccg agaagcagag cagccctcag tcgccatcgt ctgccgccgt catcgggccg 6660 
acgtctgtcg ggtgacatcg cgctatcagc agcgctgagc ccgccagcag gccccagggc 6720 
cctgcctcga tggccttccc catcacccct gcgcactccc ccagcgacgg ccgcgcagcg 6780 
acggccgcgt ccaagcaacc gccgtgccgg cgcggcccca cgcgcgcgac aggcgagcgt 6840 
cctggcgcgg cctgcgcatc gctggaagga ccggcggagc. atggatagag aaccgaggac 6900 
cgcgatcctt gttgccatcg cagccaacgt ggcgatcgcg gcggccaagt tcatcgccgc 6960 
cgccgcgacc ggcagctcgg cgaggcgtct gccgacttcg gcggcgtccc gcgcgtgccg 7020 
ctctacgaca acctcaagag cgccgtcgcc gagcgccacg gcgacgcgac ccggccccac 7080 
cccacgccgc cggccccgtc ggcgcaccac cgccccgagc cgcgccccgc cgccgtcgcc 7140 
cgcggcaacg agaagggccg cgtccagcgc gccatcacgg cgtggacgac acggcgcgga 7200 
aacgtcgtcg taaccgccca gcaatgtcac gggaacggcc cctcgaaacg gccccccgag 7260 
ggggccggcc ggggccgacg acaccgcgcg atccccccgc caattcccga tggcaaaaga 7320 
aaaattcgcc atagatcgta agccgtgaca gtggtctgtc ttacgttgcg ccctccgcac 7380 
ctcgagcgag ttctcccgga taaccttcaa tccccccgag gggggcccgg cccccggccc 7440. 
cccaggaagc ctgatcgrgga cgagctaatc cccacccacc tttccgaggc cccgctcaaa 7500 
gggacta'gac cgagcgagac agccccctcg cagcgcgcga agaacctggg ectcgaccgg 7560 
aggacgatcg acgcccgcga gcgggccagc . cgctgaggat gcgcccgtcg tggcggatcg 7620 
ccccatcgag cgcgcagccg aagacccgat tg.cgaccgcc ggagcgagtt gccgcctgcc 7680 
cggtggcgcg accgatctga gcgggttctg gacgcccctc gagggctcgc gcgacaccgt 7740 
cgggcgagcc cccgccgaac gctgggacgc agcagcgtgg cctgatcccg accccgatgc 7800 
cccggggaag acgcccgtta cgcgcgcatc ctccctgagc gacgcagcct gctccgacgc 7860 
ccccctcttc ggcaccccgc ctcgcgaagc gccgcggacg gaccctgcac accgaccctt 7920 
gctggaggtg tgctgggagg cgctggagaa cgccgcgacc gccccaccgg cgcccgccgg 7980 
tacggaaacg ggagcgctca tcgggaccgg cccgcccgaa cacgaggccg cgctgccgca 8040 
agcgacggcg cccgcagaga ccgacgctca tggcgggccg gggacgacgc ccagcgccgg 8100 
agcgggccga atcccgcacg cccccgggcc gcgagggccg cgcgtcgcgg cggacacggc 8160 
ccactcgtcc ccgctggcgg ccgctcaccc ggcccgccag agcccgcgcc ccggggaacg 8220 
ctccacggcc ctggctggcg ggg.taccgcc gatgtcgccg ccgagcaccc tcgtgtggcc 8280 
ctcgaaaacc cgggcgccgg ccagggacgg^ ccgccgcaag gcattcccgg cggaggccga 8340 
tgggtccgga cgaggcgaag ggcgcgccgt cgcggtcctc aagcggccca gcggagcccg 8400 
cgcggacggc gatcggatat tggcggtgac ccgaggaccc gcgatcaatc acgacggcgc 8460 
gagcagcggc ctgaccgtgc cgaacgggag cccccaagaa atcgcgccga aacgggccct 8520 
ggcggacgca ggccgcgccg cgccctcggc .gggttatgcc gaggcacacg gcacgggcac 8580 
gacgcttggt gaccccaccg aaatccaagc tccgaatgcg gtacacggcc tcgggcgaga 8640 
tgtcgccacg ccgccgctga tcgggtcggt gaagaccaac cttggccacc ctgagcatgc 8700 
gtcggggacc accgggccgc tgaaggccgt cttgtcccct cagcacgggc agactcctgc 8760 
gcacccccac gcgcaggcgc tgaacccccg gatctcatgg ggcgatcccc ggctgaccgt 8820 
cacgcgcgcc cggacaccgt ggccggactg gaacacgccg cgacgggcgg gggcgagctc 8880 
gtccggcacg agcgggacca acgcgcacgt ggcgccggaa gaggcgccgg cggcgacgcg 8940 
cacaccgccg gcgccggagc gaccggcaga gccgccggtg ccgtcggcaa ggaccgcgtc . 9000 
agccccggat gcacaggcgg cgcggctgcg cgaccatctg gagacctacc ccccgcagtg 9060 
cccgggcgac gcggcgccca gcctggcgac gacgcgcagc gcgatggagc accggctcgc 9120 
ggtggcggcg acgccgaggg aggggctgcg ggcagccccg gacgctgcgg cgcagggaca 9180 
gacgccgccc ggtgcggcgc gcagtaccgc cgattcccca cgcggcaagc. tcgcctttct 9240 
cttcaccgga cagggggrcgc agacgctggg cacgggccgc gggctgtacg atgcacggtc 9300 
cgcgctccgc gaggcgcccg acccgtgcgt gaggctgtcc aaccaggagc ccgaccggcc 9360 
gctccgcgag gtgatgtggg ccgaaccggc cagcgtcgac gccgcgccgc tcgaccagac 9420 
agcctccacc cagccggcgc tgctcacctt cgaacatgcg ctcgccgcgc tgtggcggtc 9480 
gcggggtgta gagccggagt tggtcgccgg ccacagcacc ggcgagccgg tggctgcctg 9540 
cgtggcgggc gtgctcccgc ttgaggacgc ggcgcccctg gcggctgcgc gcgggcgcct 9600 
aacgcaggcg ctgccggccg . gcggggcgat ggcgccgacc gaggcgccgg aggccgatgt 9660 
ggctgccgcg gcggcgccgc acgcagcgcc ggtgtcgacc gccgcggtca acgctccgga 9720 
ccaggtggtc accgcgggcg ccgggcaacc cgcgcatgcg atcgcggcgg cgacggccgc 9780 
gcgcggggcg cgaaccaagg cgctccacgt cccgcatgcg tcccactcac cgctcacggc 9840 
cccgatgctg gaggcgttcg ggcgtgtggc cgagtcggtg agctaccggc ggccgtcgat 9900 
cgtcctggtc agcaatccga gcgggaaggc ccgcacagac gaggtgagct cgccgggcta 9960 
ttggrgtacgc cacgcgcgag aggcggtgcg crtcgcggat ggagtgaagg cgctgcacgc 10020 
ggccggcgcg ggcaccctcg tcgaggccgg xccgaaaccg acgctgctcg gcccggtgcc 10080 
tgcctgcatg ccggacgccc ggccggcgcc gcccgcaccg tcgcgcgctg ggcgtgacga 10140 
gccggcgacc gcgctcgagg cgctcggcgg gccctgggcc gtcggcggcc tggtctcctg 10200 
ggccggcccc tccccctcag gggggcggcg ggcgccgctg cccacgtacc cttggcagcg 10260 
cgagcgccac cggaccgaca cgaaagccga cgacgcggcg cgtggcgacc gccgtgctcc 10320 
gggagcgggt cacgacgagg ccgaggaggg gggcgcggcg cgcggcggcg accggcgcag 10380 
cgcccggccc gaccatccgc cgcccgagag cggacgccgg gagaaggtcg aggccgccgg 10440 
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cgaccgtccg ttccggctcg agatcgatga gccaggcgtg cttgatcacc ccgtgcctcg 10500 
ggccacggag cggcgcgccc ctggtctggg cgaggtcgag accgccgtcg acgcggcggg 10560 
gctcagcttc aatgatgtcc agctcgcgct gggcatggtg cccgacgacc tgccgggaaa 10620 
gcccaaccct ccgctgccgc tcggaggcga gtgcgccggg cgcatcgtcg ccgtgggcga 10680 
gggcgtgaac ggcctcgtgg tgggccaacc ggtcatcgcc ccttcggcgg gagcgtttgc 10740 
tacccacgtc accacgtcgg ctgcgctggt gctgcctcgg cctcaggcgc tctcggcgat 10800 
cgaggcggcc gccatgcccg tcgcgcacct gacggcatgg cacgcgctcg acagaatagc 10860 
ccgccttcag ccgggggagc gggtgctgat ccatgcggcg accggcgggg tcggcctcgc 10920 
cgcggtgcag tgggcgcagc acgcgggagc cgaggtccat gcgacggccg gcacgcccga 10980 
gaaacgcgcc tacccggagc cgctgggcgc gcggtatgtg agcgactccc gctcggaccg 11040 
gttcgccgcc gacgcgcgcg cgcggacggg cggcgaggga gtagacgccg tgctcaactc 11100 
gctctcgggc gagctgatcg acaagagttt caatctcctg cgatcgcacg gccggtttgt 11160 
ggagctcggc aagcgcgact gttacgcgga taaccagccc gggctgcggc cgttcctgcg 11220 
caatctctcc ttctcgctgg tggatctccg ggggatgatg ctcgagcggc cggcgcgggt 11280 
ccgtgcgctc ttggaggagc tcctcggcct gatcgcggca ggcgtgttca cccctccccc 11340 
catcgcgacg ctcccgaccg cccgtgccgc cgatgcgttc cggagcatgg cgcaggcgca 11400 
gcatcttggg aagctcgtac tcacgccggg cgacccggag gtccagaccc gtactccaac 11460 
ccacgcaggc gccggcccgt ccaccgggga tcgggacctg ctcgacaggc tcgcgtcagc 11520 
tgcgccggcc gcgcgcgcgg cggcgctgga ggcgttcctc cgcacgcagg tctcgcaggc 11580 
gctgcgcacg cccgaaatca aggtcggcgc ggaggcgccg ttcacccgcc tcggcatgga 11640 
cccgctcatg gccgtggagc cgcgcaaccg tatcgaggcg agcctcaacrc cgaagctgtc 11700 
gacgacgntc ctgtccacgt cccccaacac cgccttgctg gcccaaaacc tgttggatgc 11760 
tctcgccaca gccctctcct tggagcgggc ggcggcggag aacctacggg caggcgcgca 11820 
aaacgacttc gtcccatcgg gcgcagatca agactgggaa atcaccgccc tatgacgatc 11880 
aatcagcttc tgaacgagct cgagcaccag ggtatcaagc tggcggccga tggggagcgc 11940 
ctccagatac aggcccccaa gaacgccccg aacccgaacc ngcccgctcg aatctccgag 12000 
cacaaaagca cgatcctgac gatgctccgc cagagactcc ccgcagaacc catcgtgccc 12060 
gccccagccg agcggcacgc cccgcttcct ctcacagaca tccaagaatc ctaccggctg 12120 
ggccggacag gagcgttcac ggcccccagc gggatccacg cctatcgcga acacgactgt 12180 
acggacctcg acgcgccgag gccgagccgc gccnttcgga aagccgccgc gcggcacgac 12240 
acgccccggg cccacacgcc gcccgacacg acgcaggcga ccgagcctaa agccgacgcc 12300 
gacatcgaga tcaccgaccc gcgcgggccc gaccggagca cacgggaagc gaggctcgtg 12360 
tcgccgcgag acgcgatgtc gcaccgcatc tacgacaccg agcgccctcc gctctaccac 12420 
gtcgtcgccg ctcggccgga cgagcggcaa acccgtctcg tgcccagcat cgatctcatt 12480 
aacgttgacc taggcagccc gtccatcatc cccaaggact ggcccagcct ctacgaagat 12540 
cccgagacct ctctccctgc cctggagctc tcgtaccgcg actacgcacc cgcgctggag 12600 
tctcgcaaga agtctgaggc gcatcaacga tcgatggact actggaagcg gcgcatcgcc 12660 
gagctcccac ctccgccgac gcttccgatg aaggccgatc cacctaccct gaaggagatc 12720 
cgcttccggc acacggagca atggccgccg tcggactcct ggggccgatc oaagcggcgt 12780 
gtcggggagc gcgggctgac cccgacgggc gtcatcctgg ccgcacxttc cgaggtgacc 12840 
gggcgctgga gcgcgagccc ccggcctacg cccaacataa cgctcttcaa ccggctcccc 12900 
gtccatccgc gcgcgaacga tatcaccggg gacttcacgc cgatggtccc cctggacatc 12960 
gacaccactc gcgacaagag cctcgaacag cgcgctaagc grattcaaga gcagctgtgg 13020 
gaagcgatgg atcactgcga cgcaagcggt accgaggtcc agcgagaggc cgcccgggtc 13080 
Ctggggaccc aacgaggcgc accgctcccc gcggtgccca cgagcgcgct taaccagcaa 13140 
gtcgttggtg tcaccccgtt gcagaggctc ggaactccgg tgtacaccag cacgcagacc 13200 
ccccagccgc tgctggatca tcagctccac gagcacgatg gggacctcgc cctcgcgtgg 13260 
gacatcgtcg acggagtgtc cccgcccgac cttctggacg acatgctcga agcgtacgtc 13320 
gtctttctcc ggcggctcac tgaggaacca tggggtgaac aggcgcgccg ttcgcttccg 13380 
cctgcccagc xagaagcgcg ggcgagcgca aacgcgacca acgcgctgcc gagcgagcat 13440 
acgccgcacg gcctgctcgc ggcgcgggtc gagcagccgc ccatgcagct cgccgtggtg 13500 
tcggcgcgca agacgctcac gcacgaagag ccttcgcgcc gcccgcggcg acctggcgcg 13560 
cggctgcgcg agcagggggc acgcccgaac acattggccg cggcggcgat ggagaaaggc 13620 
cgggagcagg tcgtcgcggc tctcgcggcg ctcgagtcag gcgcgaccta cgtgccgatc 13680 
gatgccgacc taccggcgga gcgtacccac tacctcctcg accatggtga ggtaaagctc 13740 
gtgctgacgc agccatggct ggatggcaaa ctgtcatggc cgccggggat ccagcggctg 13800 
ctcgtgagcg aggccggcgc cgaaggcgac ggcgaccagc ctccgacgat gcccattcag 13860 
acaccttcgg atcccgcgca tgtcatctac acctcgggac ccacagggtt gcccaagggg 13920 
gtgatgatcg accatcgggg tgccgtcaac accaccctgg acatcaacga gcgcttcgaa 13980 
acagggcccg gagacagggc gctggcgctc tcctcgctga gcttcgacct ctcggtctac 14040 
gatgtgttcg ggatcctggc ggcgggcggt acgatcgtgg tgccggacgc gtccaagccg 14100 
cgcgatccgg cgcattgggc agagctgatc gaacgagaga aggcgacggc gcggaactcg 14160 
gtgccggcgc tgatgcggac gcccgtcgag cactttgagg gccgccccga ctcgctcgct 14220 
aggtccctgc ggctctcgct gctgagcggc gactggatcc cggtgggcct gcctggcgag 14280 
ccccaggcca tcaggcccgg cgtgtcggtg accagcctgg gcggggccac cgaagcgtcg 14340 
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atctggtcca tcgggtaccc cgtgaggaac gtcgacctat cgtgggcgag catcccctac 14400 
ggccgtccgc tgcgraacca gacgttccac gcgcccgatg aggcgctcga accgcgcccg 14460 
gtctgggttc cggggcaact ctacattggc ggggtcgggc tggcactggg ctaccggcgc 14520 
gatgaagaga agacgcgcaa gagcttcctc gcgcaqcccg agaccgggga gcgcccccac 14580 
aagaccggcg atctgggccg ctacctgccc gacggaaaca tcgagttcat ggggcgcgag 14640 
gacaaccaaa tcaagcttcg cggataccgc gtcgagcccg gggaaatcga ggaaacgccc 14700 
aagtcgcacc cgaacgtacg cgacgcggtg actgcgcccg ccgggaacga cgcggcgaac 14760 
aagctccctc cagcctatgt ggccccggag ggcacacgga gacgcgctgc cgagcaggac 14820, 
gcgagcccca agaccgagcg gatcgacgcg agagcacacg ccgccgaagc ggacggcttg 14880 
agcgacggcg agagggtgca gttcaagctc gctcgacacg gaccccggag ggacccggac 14940 
ggaaagcccg tcgtcgatcc gaccgggcag gatccgcggg aggcggggct ggacgcccac 15000 
gcgcgccgcc gtagcgtccg aacgtccccc gaggccccga ttccgtttgt tgagtctggt 15060 
cgattcccga gctgcttgag cagcgtggag cccgacggcg cgacccttcc caaaccccgc 15120 
tatccatcgg cgggcagcac gtacccggtg caaacctacg cgtatgtcaa atccggccgc 15180 
atcgagggcg tggacgaggg ctcctactat taccacccgt: tcgagcaccg ttcgccgaag 15240 
ctccccgacc acgggatcga gcgcggagcg cacgcccggc aaaacttcga cgtgcccgat 15300 
gaagcggcgt tcaacctcct gttcgtgggc aggaccgacg ccatcgagtc gccgcatgga 15360 
tcgccgtcgc gagaattttg cccgctggag gccggatata cggcgcagct cctgatggag 15420 
caggcgccct cccgcaacat cggcgtctgc ccggcggggc aattcaattt cgaacaggct 15480 
cggccggttc tcgacctgcg acactcggac gcttacgtgc acggcacgct gggcgggcgg 15540 
gtagacccgc ggcagttcca ggcctgtacg cccggtcagg actcctcacc gaggcgcgcc 15600 
acgacgcgcg gcgcccctcc cggccgcgag cagcacttcg ccgatatgct tcgcgacttc 15660 
ctgaggacca aactacccga gtacatggtg cctacagcct ccgcggagct cgatgcgc^g 15720 
ccgccgacgc ccaacggcaa ggccgatcgc aaggccccgc gcgagcggaa ggatacctcg 15780 
ccgccgcggc attcggggca cacggcgcca cgggaqgccc tggaggagat cctcgtcgcg 15840 
gtcgcacggg aggtgctcgg gctggaggcg gccgggcccc agcagagctc cgccgatcct. 15900 
ggtgcgacac cgattcacat cgttcgcatg aggagcccgc tgcagaagag gctggacagg 15960 
gagaccgcca tcaccgagtt gttccagtac ccgaaccccg gcncgccggc gcccggcttg 16020 
cgccgagact cgagagatct agatcagcgg ccgaacacgc aggaccgagt ggaggttcgg 16080 
cccaagggca ggagacgcag ctaagagcgc cgaacaaaac caggccgagc gggccgatga 16140 
gccgcaagcc cgcccgcgtc accctgggac tcacctgatc tgaccgcggg tacgcgccgc 16200 
aggtgtgcgc gttgagccgt gttgttcgaa cgctgaggaa cggtgagrctc atggaagaac 16260 
aagagtcctc cgctatcgca gccaccggca tgccgggccg cttcccgggg gcgcgggacc 16320 
tggacgaact ctggaggaac cttcgagacg gcacggaggc cgcgcagcgc tcctccgagc 16380 
aggagcLcgc ggcgtccgga gtcgaccccg cgctggcgct: ggacccgagc tacgtccggg 16440 
cgggcagcgt gctggaagac gtcgaccggc ccgacgccgc tttcttcggc atcagcccgc 16500 
: gcgaggrcaga gctcatggat ccgcagcacc ggatcctcat ggaacgcgcc tgggaggcgc 16560 
tgaagaacgc cggatacgac ccgacggccc acgagggctc tatcggcgcg tacgccggcg 16620 
ccaacatgag cccgtacttg acgtcgaacc tccacgagca cccagcgatg atgcgg.tggc 16680 
ccggccggcc ccagacgctg atcggcaacg acaaggatca ccccgcgacc cacgcccccr 16740 
acaggccgaa tctgagaggg ccgagcatct ccgctcaaac tgcctgctcc acctcgctcg 16800 
tggcggctca cctggcgtgc atgagcctcc tggaccgcga gtgcgacatg gcgctggccg 16860 
gcgggattac cgtccggatc ccccaccgag ccggccatgt atatgctgag gggggcacct 16920 
cccctcccga cggccactgc cgggcctccg acgccaaggc gaacggcacg atcacgggca 16980 
acggctgcgg cgttgtcctc ctgaagccgc tggaccgggc gctctccgac ggtgaccccg 17040 
tccgcgcggt tatccctggg tctgccacaa acaacgacgg agcgaggaaig accgggctca 17100 
ctgcgcccag tgaggtgggc caggcgcaag cgatcatgga ggcgctggcg ctggcagggg 17160 
tcgaggcccg gtccatccaa tacatcgaga cccacgggac cggcacgctg cccggagacg 17220 
ccatcgagac ggcggcgctg cggcgggcgc tcggtcgcga cgcctcggcc cggaggcctt 17280 
gcgcgatcgg ctccgtgaag accggcaccg gacaccccga accggcggcc ggcaccgccg 17340 
gcctgaccaa gacggtctcg gcgccggagc accggcagct gccgcccagc ccgaacctcg 17400 
agtctcctaa cccaccgacc gattccgcga gcagcccgct ctacgtcaat acccccccta 17460 
aggaccggaa caccggcLcg actccgcggc gggccggcgt cagcccgccc gggatcggcg 17520 
gcaccaacgc ccatigtcgtg ccggaggaag cgcccgcggc gaagccccca gccgcggcgc 17580 
cggcgcgctc tgccgagctc ttcgtcgtct cggccaagag cgcagcggcg ccggacgccg 17640 
cggcggcacg gctacgagac catctgcagg cgcaccaggg gatctcgttg ggcgacgtcg 17700 
ccctcagcct ggcgacgacg cgcagcccca tggagcaccg gctcgcgatg gcggcgccgt 17760 
cgcgcgaggc gccgcgagag gggctcgacg cagcggcgcg aggccagacc ccgccgggcg 17820 
ccgcgcgcgg ccgctgctcc ccaggcaacg tgccgaaggt ggtcttcgtc tttcccggcc 17880 
agggcLctca gcgggtcggc atgggccggc agctcctggc tgaggaaccc gccccccacg 17940 
cggcgcctcc ggcgtgcgac cgggccaccc aggccgaagc tggctggccg ccgctcgcgg 18000 
agctcgccgc cgacgaaggg tcctcccagc tcgagcgcat cgacgtggtg cagccggcgc 18060 
ncctcgccct: cgcggtggca tttgcggcgc tgtggcggcc gtggggtgtc gcgcccgacg 18120 
tcgcgaccgg ccacagcatg ggcgaggcag ccgccgcgca tgcggccggg gcgctgtcgc 18180 
ccgaggacgc ggtggcgatc atctgccggc gcagccggct gctccggcgc atcagcggtc 18240 
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, agggcgagat 
acgaggatcg 
agccggcagc 
gggtgaaggt 
tggcagccct 
cgggcgccat 
agccagtgcg 
tggagatgag 
. agcgggcggg 
tggaggcgct 
ccgrcgggggg 
tcgaagcgcc 
cgctcctcgg 
cgctggatct 
ttccgggcgc 
gccctttgca 
cggcgttggt 
cgagccgggc 
tccgagtgga 
tccaggccag 
acggccctgc 
gggcacgccc 
tggacgcgtg 
gggtgcccgt 
gccacgcgcg 
gggcggtcga 
ctccgggagg 
ccgcagcggt 
gcgggctcgg 
cggcagagaa 
gccaggctcc 
cagggctcgg 
cccccgatcc 
ccggcatggg 
tcggcgccgg 
ccatggagca 
gggagcccgc 
tgcgcggtgg 
gggggaggat 
ccgtgaccgg 
gcgccggtca 
ccgtcgcggc 
accgggcgca 
gcgtcgtcca 
ggtttcgtaa 
gcgaagcgcc 
cgggccaggg 
gggcgcaggg 
cggccgcgca 
ccgacgaggg 
tgatgccggt 
cgtcgtcgcg 
acctgctccg 
xcctccgcgc 
ccccgcccac 
aggccatgct 
cgctgagcgg 
ccaccgccga 
tcgcagcaaa 
gctgaaacaa 
ggcggagctg 
cggtgcggac 
gccgcccgac 
ggcggggctg 
ccgggaggcg 



ggcggtgacc 
ggtgagcgtg 
gatcggcgag 
ggatgtcgcc 
gggcgggctc 
ggtagcgggc 
cttcgccgag 
cccgcatccg 
cgcagcggtg 
gggcacgccg 
gcggcgggta 
ggccaagagc 
tgaaatgcag 
caagcggctg 
ggcgtacctg 
gataactgac 
ccaggtggtg 
gccgggcgct 
grcgcaccgag 
cacacccgcc 
cttccagggg 
gcccgacgcg 
cctccagacc 
ggagttgggc 
cgtcgtgaac 
cagctcgggc 
ggtgcgccgg 
cggcacagcc 
cgccgcgttg 
caacacgagc 
gacggcggtg 
ggcgcaaggc 
ggcgctggta 
cctccgagac 
cgacgtcccc 
cgcggacctg 
cgccctgctg 
cgagcgacgc 
cgagagctgc 
cggtctgggt 
cccggtgccg 
gcccgaggcc 
gcccgagcgg 
tgcggccggc 
ggtgatggcg 
gctttccctc 
caactacgcc 
gccgccagcg 
ggaagatcgc 
gctgtccgct 
gaacccgcgg 
cccggtgacg 
ccgcctcgcc 
gcagaccccg 
gagcctgggc 
gggcatcacc 
gcatctggcg 
ccctgccgtc 
actcaaggcg 
gcggccatca 
gaacggaccg 
gccccggaag 
acgcgctggg 
cccaccgagc 
cgaccgcccg 



gagctgtcgc 
gccgcgagca 
gtgctgtcgt 
agccacagcc 
cggccgggtg 
ccggagctcg 
gtagtccagg 
accctaacga 
ggcccgctgc 
tgggcgcagg 
ccgctgccga 
gccgcgggcg 
accctgtcaa 
ccgtggctcg 
gagatggcga 
gtggtgctcg 
acgacggagc 
ggccacgcgt 
gccccggccg 
gcggccacct 
a tt get gage 
gccggctcgg 
gtcggcagcc 
tcgctgcggc 
cacgggcacc 
gcagtggtcg 
cgcgaagaag 
aaggtcaacg 
cgcgcgacgc 
gccgccggcg 
gtgcacctcg 
gcattggacg 
cgtggctgcg 
gccccgcgat 
gcgacacagg 
cgctgcgctc 
gccgagccgc 
gtcgctcgga 
gttccgaccg 
gggcccggtc 
gtgggccgct 
cgcggcgcgc 
atcctccgcg 
atcttggacg 
cccaaggtcc 
ttcgtgctgt 
gcggccaaca 
ttgagegteg 
ggcgcgcggc 
ctggcacggc 
ctgtgggtgg 
gcgcatcgcg 
gctgccgagc 
caggtgctgc 
acgaactcgc 
gtaccggcaa 
egggaggcat 
gagatcgagg 
cttacangac 
teattcageg 
agccgatcgc 
cgttttggga 
cgctggtggg 
cgatagattg 
acccgcagca 



tggecgagge 
acagcccgcg 
ccctgaacgc 
egcaggtega 
cggctgcggc 
gagegaatta 
cgcagctcca 
etteggtega 
ggegggggea 
gctaccctgt 
cctacccccg 
atcgccgcgg 
cccagacgag 
gcgaccaccg 
tttcgtcggg 
ccgaggcgct 
ageegteggg 
ccttccgggt 
ggctcacgcc 
acgcggagcc 
tacggcgggg 
cageggagta 
tcttcgcccg 
tettgeageg 
aaacccccga 
ccgaagttcg 
acgattggtt 
egggceggtg 
tggaggcegg 
tacgcgcgct 
gcagcctcga 
cgccccggag 
acagegtget 
tgtggctttt 
caccgccgct 
gggtcgacct 
tggecgaega 
tcgcccgccg 
aegccaccat 
tgagegegge 
ccggcgcggc 
gcgtcaccgt 
aggtcaccac 
acgggccgcc 
agggggcect 
aegctteggg 
cgttcctcga 
actggggect 
tggtcccccg 
tgcccgaaag 
agctctaccc 
cgagcgccgg 
cgagcgcgcg 
gcctccccga 
tgatggggcc 
cgctgttgtg 
gcgaagccgc 
agaegtcgea 
tactcgeggt 
gctggaggag 
catcgtcggc 
gctgcccgac 
cgtcgctccc 
cctcgatgct 
tcgtctgttg 



egaggeggeg 
cccgacggcg 
gaagggggcg 
cccgctgcgc 
gccgatgcgc 
ccggacgaac 
aggcggccac 
ggagatgegg 
ggacgagege 
accccggggg 
geagegegag 
cgcgcgcgcg 
cacgcggctg 
ggcgcaggga 
ggecgagget 
ggccctcgcg 
gcggccgcag 
ccacgctcgc 
ttccgccgng 
gaccgagacg 
egaaggegag 
tcggctgcac 
cageggegag 
gcctccgggg 
ccggcagggc 
egggctegtg 
cctggagccc 
gctgcccctc 
cggccacgcc 
cccggcaaag 
tgggggegge 
cgccgacgtc 
ccggaccgtg 
gacccgcggc 
ggggceggge 
cgacccagcc 
cgccgaagcg 
gcagcccgag 
ccgcgcggac 
cggacggccg 
gagegeggag 
ggcgaaggcg 
gneggggacg 
gaegcagcag 
gcacccgcac 
agcagggccc 
cgctctggcg 
gtccgeggag 
eggaatgegg 
cggccgcgct 
cgcggcggcg 
cgggccagcc 
gagcgcgccc 
gggcaagatc 
cgagccgcgc 
gacctacccc 
tcctgtggag 
ggacgatctg 
cctacggcac 
cggctcgctg 
atcggctgcc 
gcggagcgcg 
gccgaggccg 
gcgttccecg 
ccggaggccg 



ctccgaggct 
ctctcgggcg 
ctctgccgtc 
gaggacccct 
tcgacggtga 
aacctcaggc 
ggtctgttcg 
cgcgcggccc 
ccggcgatgc 
cggctgttcc 
eggtacegga 
ggcggtcacc 
tgggagacga 
gcggtcgtgt 
ttgggcgacg 
ggegacgegg 
ttccagaccg 
ggcgcgctgc 
cgcgcgcggc 
gggctgcagt 
gcgctgggac 
cctgcgctgc 
gcgacgccgc 
gagctgegge 
gccgaccttc 
gcgcagcggc 
gagegggaac 
ggcggcggcg 
gccgcgcacg 
gcccttgacg 
gagcccgacc 
agtcccgacg 
caggccctgg 
gcacaggccg 
cgcgtcatcg 
cggcccgagg 
gaagtcgege 
acccggcccc 
agcacctacc 
gccgagcgcg 
caaegggcag 
gacgtcgccg 
ccgctgcggg 
actcccgcgc 
gcgttgacgc 
ttgggcccgc 
caccaccgga 
gcgggeatgg 
agcctcaccc 
caggtggggg 
tccccgcgaa 
ggggacgggg 
ctggagccgc 
gaggtggacg 
aaccgcatcg 
aeggeggegg 
tcaccgcaca 
acgcagttga 
agcagaatcc 
ggctcgcaca 
gcttccctgg 
acgcggtcca 
tgccgcactg 
gcatctcgcc 
cttgggaggg 



18300 
18360 
18420 
18480 
18540 
18600 
18660 
18720 
18780 
18840 
18900 
18960 
19020 
19080 
19140 
19200 
19260 
19320 
19380 
19440 
19500 
19560 
19620 
19680 
19740 
19800 
19860 
19920 
19980 
20040 
20100 
20160 
20220 
20280 
20340 
20400 
20460 
20520 
20580 
20640 
20700 
20760 
20820 
20880 
20940 
21000 
21060 
21120 
21180 
21240 
21300 
21360 
21420 
21480 
21540 
21600 
21660 
21720 
21780 
21840 
21900 
21960 
22020 
22080 
22140 



WO 99/66028 



PCT/EP99/04171 



-7- 



gctcgaggac 
cggcgctttc 
cgcgtacagc 
ggggttgcag 
ccacctcgcc 
cagcgcgctc 
cgatggtcgt 
tggcctggtc 
gctgatccgg 
cgtgctggct 
ggccgtcgat 
cgaggcgctig 
cgcggtgaag 
ggcagcgctt 
cccgcggatc 
gcgcacggac 
gcatgtggtg 
ggcggagctt 
gccgcgcgag 
ggcgacgacg 
gccgcrggcg 
catcgcgagc 
gccgggcatg 
gcgcgtggcg 
ggcggggagc 
cgcggcggag 
ggttgggcat 
agatggggtg 
' cgcgacggtg 
ggcgtcggtg 
gcaagcggtg 
gcatgtcccg 
ggtggcggcg 
gaaggtggtc 
ggtgcgcttc 
agtgggcccg 
gacgctgctg 
gggcaggctg 
gcggcgggtg 
ggccgaaggg 
cnggcccgag 
ggcgccggcc 
acgcccgcgc 
ccaggccctc 
cgtcgtggag 
gccggtgctc 
cgcgacccga 
gctgtggggt 
ggacctggat 
gccggacgcc 
ggccgcccca 
ggcgacgggt 
ggcggggcac 
agaccagccg 
cgcgcgggcc 
ggcggccgtc 
gctgccggcc 
ggcatgggcg 
ctcggcgtcg 
cttggacgcg 
gggcctgtgg 
gggaacctgg 
gcgcgcgacg 
cgacgcgagc 
cccggccgcg 



gccggtatcc 
acggcggact 
gccaccggca 
ggaccttgcc 
cgccgcagcc 
ctctcccccg 
tgccggacct 
gtcctcaaac 
ggctcggcca 
caggagacgg 
tacgtcgaga 
cgggcgacgg 
accaacatcg 
tcgctgacgc 
cggcccgagg 
cgcccgcgcc 
ctggaagagg 
ttggtgctgt 
cacctggaca 
cgcagcgcga 
gcgctctcgg 
tcctcgcgcg 
ggccgggggc 
ctgttcgacc 
gccgagccgt 
tacgcgctga 
agcatcgggg 
aggctcgtgg 
tcgctcggag 
tcgatcgcgg 
caggcgatcg 
cacgegttcc 
tcggtgacgt 
acggacgagc 
gcggacgggg 
aagccgacgc 
gcgtcgttgc 
tgggccgccg 
ccgctgccga 
ctcggagcca 
atgcctcgct 
gaccggggtg 
gccgcgctcc 
ggtggccgca 
gcgggggcat 

gcgctgattc 
ggggcctgca 
atgggccggg 
ccggaggaga 
gaggatcagc 
ccggagggaa 
gggccgggcg 
cttgtigctga 
ccagaggtgc 
accgcggcgg 
gagccgccgc 
caccaggacg 
ctgcacaccc 
ggcgtcttcg 
ctggcggacc 
gcggaggggg 
gcgatgccga 
cagcgcgtgg 
cgaggccgct 
ccagctgtgg 



cgccccggtc 
acgcgcgcac 
acacgctcag 
tgaccgtcga 
tgcgcgcagg 
acatgatgga 
tcgatgcctc 
ggctctccga 
ccaaccacga 
tcxtgcgcga 
cccacggaac 
tggggccggc 
gccatctcga 
acgagcgcat 
gcagcgcgct 
tcgcgggggt 
cgccggcggt 
cgggcaagag 
tgcacccgga 
cgagccaccg 
ccgtggcgca 
gcaagccggc 
tttgcgcggc 
gggagctgga 
tgtcgctcga 
cggcgctgtg 
agctggtggc 
cggcgcgcgg 
cgccggaggc 
cggccaacgg 
cggcggggcc 
actcgccgcc 
accggcggcc 
tgagcgcgcc 
tgaaggcgct 
tgctcgg;gct 
gcgccgggcg 
gcggctcggt 
cctacccgtg 
cggccgccga 
cacccgtgga 
gagccgggga 
atgcgcccgc 
acgactggca 
cggccgaaga 
aggcgctcgg 
cggtgggcgg 
tcgcggcgct 
gcccgacgga 
tggcattccg 
acgcagcgcc 
cccttggcct 
tcagccggca 
gcgc^cgcat 
cggccgacgc 
tgcggggggt 
ctggtcggct 
ttacccgcga 
gcccgatcgg 
tccgccgaac 
ggatgggctc 
cgagtcgggc 
tcacccagat 
tctgggatcg 
agcgccggcg 



catcgacggg 
ggtcgctcgg 
catcgccgcc 
cacggegtgc 
agagagcgat 
agccgcggcg 
ggccaacggg 
cgcgcaacgg 
cggccggtcg 
ggcgctgcgg 
agggaccccg 
gcgccccgac 
ggccgcggca 
cccgagaaac 
cgcgttggcg 
gagctcgtcc 
ggagctgtgg 
cgagggggcg 
gctcgggccc 
gctcgcggcg 
ggggcagacg 
gttcccgttc 
gcggccagcg 
ccgcccgctg 
ccagacggcg 
gcggtcgtgg 
ggcgtgcgcg 
gcggctgatg 
ggaggtggcg 
gccggagcag 
cgcggcgcgc 
gacggaaccg 
aagcgtttcg 
ggggtactgg 
gcacgaagcc 
gctgccagcc 
cgaggaggcc 
cagctggccg 
gcagcggcag 
tgcgccggcg 
cccgcggcga 
ggcggccgcg 
cgaggcctcc 

gggggtgcng 

ggtcgccaaa 
cacggggccg 
cgagcctgac 
agagcatccc 
ggtcgaggcc 
ccaggggcgc 
ggtgtcgctg 
cctcgctgcg 
cggattgccc 
tgcggcgatc 
ggccgatgcc 
agtgcacgcc 
cgcccgggtg 
gcagccgctg 
. ccagggcagc 
gcaggggctc 
gcaggcgcag 
cctggcggcg 
ggattgggcc 
gctggcaacc 
caacgcgtct 



agccgcaccg 

ccgccgcgcg 

ggacggctgt 

tcgccatcgc 

ctcgcgntgg 

cgcacgcaag 

ttcgtccgtg 

gatggcgacc 

accgggccga 

agcgcccacg 

ccgggcgatc 

ggcacacgcc 

ggcgtagcgg 

ctcaacttcc 

accgagccgg 

gggatgagcg 

cccgccgcgc 

ctcgatgcgc 

ggggacgcgg 

gcggcgacgc 

ccggcggggg 

accggacagg 

ttccgggagg 

cgcgaggcga 

ttcacccagc 

ggcgtagagc 

gcgggggcgc 

caggggccct 

gcggcggcgg 

gcggtgatcg 

ggcgcgcgca 

atgctggagg 

ccggtgagca 

gcgcggcacg 

ggcgcgggga 

tgcctgccgg 

gcgggggcgc 

ggcgtcttcc 

cggtactgga 

cagtggtccc 

gcccggcccg 

gcggcgcttt 

gcggtcgccg 

tacccgtggg. 

gtcacccatc 

cgctcacccc 

gctgcccccc 

ggctcctggg 

ccggtggccg 

cggcgcgcag 

tctgcggagg 

cggcggtcgg 

gaccgcgagg 

gaggcgctgg 

gaaggcatgg 

gcgggtctgc 

ttgcgcccca 

gacctcttcg 

tacgcggcag 

gccgccctga 

cgccgg^aac 

atggaatggc 

catgcgggag 

gccacgaaag 

gttgtggaga 



gcgtgttcgt 

aggagcgaga 

cgtacacgct 

tggcggcgac 

cgggaggggt 

cgccgccgcc 

gcgagggccg 

gcatccgggc 

ccgcgcccaa 

tcgaagctgg 

ccatcgaggc 

gcgtgctggg 

gcctgatcaa 

gcacgctcaa 

tgccgcggcc 

gaacgaacgc 

cggagcgccc 

aggcggcgcg 

cgctcagcct 

cgcgcgaggg 

cggcgcgccg 

gcgcgcagac 

cgctcgaccg 

tgtgggcgga 

ccgcgctctt 

cggagctcct 

ccccgccgiga 

ccgcgggcgg 

cgccgcacgc 

cgggcgtgga 

ccaagcggcc 

agctcgggcg 

acctgagcgg 

tgcgggaggc 

cgctcgtcga 

aggcggagcc 

tcgaggcgct 

ccacggccgg 

ccgaggcgcc 

accgggcgga 

gcgggtggct 

cgccgcaggg 

agcaggcgac 

gtccggacgc 

ctgccgcggc 

ggccccggat 

gccaggcggc 

gcgggctcgt 

agccgctttc 

cgcggctcgc 

ggagttacct 

tggagcgcgg 

aacggggccg 

aggcgcaggg 

cggcgccctt 

ccgacgacgg 

aggtggaggg 

cactgttttc 

gcaacgcctt 

gcatcgcctg 

acgaggcatc 

cgcccggcac 

cggcgccgcg 

aggcctcctc 

cccgcccggc 
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. gccctacgag cccgtgcgcg gcgtggtcgc cggggtgatg ggctttaccg accagggcac 26100 
gcccgacgcg cgacgaggct tcgccgagca gggccccgac tccccgatgg ccgcggagat 26160 
ccgcaaacgg cctcagggcg agctgggtat gccgccgtcg gcgacgctag cgctcgacca 26220 
cccgaccgcg gagcggccgg tggaatactt gctgagccag gcgccggagc tgcaggaccg 26280 
caccgacgtg cggagcgttc ggttgccggc gacagaggac ccgaccgcca tcgcgggtgc 26340 
cgcctgccgc cccccgggcg gggtcgagga cctggagccc tactggcagc cgttgaccga 26400 
gggcgcggcg gccagcaccg aggtgccggc cgaccggtgg aacggggcag acgggcgcgt 26460 
ccccggcccg ggagaggcac agagacagac ccacgcgccc aggggcggcc ttctgcgcga 26520 
ggcggagacg cccgacgcgg cgttcttcca catctcgcct cgggaggcga cgagcccgga 26580 
cccgcaacag cggctgccgc tggaagcgag ctgggaggcg accgagcgcg cgggccagga 26640 
cccgtcggcg ccgcgcgaga gccccacggg cgtgttcgtg ggcgcgggcc ccaacgaaca 26700 
tgccgagcgg gcgcaggaac tcgccgatga ggcggcgggg ccccacagcg gcaccggcaa 26760 
catgctcagc gttgcggcgg gacggctatc atttttcctg ggcctgcacg ggccgaccct 26820 
ggccgcggac acggcgtgct cctcgtcgct ggcggcgccg cacctcggct gccagagccc 26880 
gcgacggggc gagcgcgacc aagccctggt tggcggggcc aacatgccgc tctcgccgaa 26940 
gacctccgcg ccgccctcac ggacgcacgc actttcgccc ggcgggcggc gcaagacgcc 27000 
cccggccgac gcggacggcc acgcgcgggc cgagggccgc gccgcggcgg tgctcaagcg 27060 
gccctccgac gcgcagcgcg accgcgaccc catcctggcg gcgacccggg gtacggcgat 27120 
caaccacgac ggcccgagca gcgggccgac agtgcccagc ggccccgccc aggaggcgct 27180 
gtcacgccag gcgdcggcgc acgcaggggc ggccccggcc gacgccgatt tcgcggaacg 27240 
ccacgggacc gggacggcgc tgggcgaccc gatcgaggtg cgcgcgccga gcgacgtgca 27300 
cgggcaagcc cgccctgcgg accgaccgct gatcccggga gccgccaagg ccaaccccgg 27360 
gcacacggag cccgcggcgg gcccggccgg cccgcccaag gcggcgcccg cgccggggca 27420 
agagcaaaca ccagcccagc cggagctggg cgagcccaac ccgcccctgc cgtgggaggc 27480 
gccgccggcg gcggtggccc gcgcagcggt gccgcggccg cgcacggacc gcccgcgcct 27540 
cgcgggggtg agcccgctcg ggatgagcgg aacgaacgcg catgtggtgc cggaagaggc 27600 
gccggcggcg gagccgcggc ccgccgcgcc ggagcgctcg gcggagcCCC cggcgccgcc 27660 
gggcaagagc gagggggcgc ccgacccgca ggcggcgcgg ccgcgcgagc acccggacat 27720 
gcacccggag cccgggcccg gggacgcggc gtccagcccg gcgacgacgc gcagcgcgac 27780 
gaaccaccgg cccgcggcgg cggcgacgcc gcgcgagggg ccgccggcgg cgcccccggc 27840 
cgcggcgcac gggcagacgc cgccgggggc ggcgcgccgc accgcgagcc cgccgcgcgg 27900 
caagccggcg cccccgccca ccggacaggg cgcgcagacg ccgggcatgg gccgggggcc 27960 
ccgcgcggcg cggccagcgc Cccgggaggc gcccgaccgg cgcgcggcgc cgcccgaccg 28020 
ggagccggac cgcccgccgc gcgaggcgac gcgggcggag ccggggagcg ccgagccgcc 28080 
gccgcccgac cagacggcgc tcacccagcc cgcgctcttc acggcggagc acgcgccgac 28140 
ggcgccgcgg cggccgcggg gcgtagagcc ggagccggcg gccgggcata gcgccgggga 28200 
gccggcggcg gcgcgcgcgg cgggggcgcc cccgccggaa gatggggcga ggctcgcggc 28260 
ggcgcgcggg cggccgacgc aggggccccc ggcgggcggc gcgacggcgc cgctcggagc 28320 
gccggaggcg gaggcggcgg cggcggcggc gccgcacgcg gcgccggcgc cgatcgcggc 28380 
ggccaacggg ccggagcagg cggcgaccgc gggcgcggag caagcggcgc aggcgaccgc 28440 
ggcggggccc gcggcgcgcg gcgcgcgcac caagcggccg catgtcccgc acgcgcccca 28500 
cccgccgctg acggaaccga tgctggagga gcccgggcgg gcggcggcgt cggcgacgca 28560 
ccggcggcca agcgccccgc Cggcgagcaa cccgagcggg aaggcggccg cggacgagcc 28620 
gagcgcgccg gggcaccggg tgcggcacgc gcgggaggcg gtgcgcctcg cggacggggc 28680 
gaaggcgccg cacgaagccg gcgcgggcac gctcgccgaa gtgggcccga agccgacgcc 28740 
gcccgggccg ccgccagccc gcccgccgga ggcggagccg acgccgccgg cgccgctgcg 28800 
cgccgggcgc gaggaggccg cgggggcgct cgaggcgccg ggcaggccgc gggccgccgg 28860 
cggcccggcc agccggccgg gcgcctcccc cacggccggg cggcgggtgc cgctgccgac 28920 
ctacccgtgg cagcggcagc ggcactggcc cgacaccgag cccgacagcc gccgccacgc 28980 
agccgcggac ccgacccaag gctggctcta ccgcgcggac cggccggaga taccccgcag 29040 
cccccagaaa ccagaggagg cgagccgcgg gagccggccg gcaccggcgg acaagggcgg 29100 
agccggcgag gcggccgccg cagcgccgcc gacacgcgga cccccacgcg ccgcgcccca 29160 
cgcgccggca gagacacccg cgaccgccga gccggtgacc gaggctgccg gcggccgaag 29220 
cgaccggcag gcagcgcccc acccgcgggg tccggacgcc gccgccggcg cggaggcgcc 29280 
gaccgacgag accggcgacg cgacccgccg tgccaccgcg ccggcgctcg gcccggcccg 29340 
gcccccgagc accgcgcccc gcccgccccg accccgggcc gcgacccggg gggcacgcat 29400 
cgccggcgac gagcccgcga ccgccccctg tcaggcggcg ccacggggca tgggccgggt 29460 
ggcggcgccc gagcaccccg gggcccgggg cgggcccgcg gacctggacc cccgagcgag 29520 
cccgccccaa gccagcccga tcgacggcga gacgcccgcc accgagccat cgtcgcagga 29580 
gaccgaggac cagcccgcct tccgccacgg gcgccggcac gcggcacggc cggtggccgc 29640 
cccgccacag gggcaagcgg . caccggtgtc gctgcccgcg gaggcgagcc acccggtgac 29700 
gggaggcccc ggcgggccgg gcccgatcgt ggcccagcgg ctggtggagc Cgggagcgcg 29760 
gcacccggcg ccgaccagcc ggcgcgggct gcccgacccg caggcgtggc gcgagcagca 29820 
gccgcccgag acccgcgcgc ggaccgcagc ggccgaggcg ccggaggcgc ggggcgcacg 29880 
ggcgaccgcg gcagcggcgg acgcggccga cgtcgaaccg acgacagcgc tggccccgcc 29940 
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. ggccgagccc 
ggcggagacg 
gctgccgcac 
cgcagcggrg 
cgggctcgcg 
gtgggccgag 
tccgcccatg 
ggctcagcgc 
agggcgtcgc 
cccggcggca 
gcacgagatc 
cgatcctggg 
caacctcctt 
gacggtacag 
cgacacccag 
ctgccgcccc 
cgcggcggtc 
tgatccggag 
gagattggac 
gcagcggccg 
cacgccgcga 
gcagcggccg 
gcccagcgcc 
cacggacacg 
actgggcgag 
ccccgcgccg 
ggccgacgcg 
gcgcgacgcg 
ccacgacggc 
gcgccaggcg 
cgggacaggg 
cccagggcgc 
cccggaggcg 
gcagaccccg 
gccggcggcg 
cggcgcgagc 
ggaggtggag 
caagagcgcg 
cccggagccg 
gcaccggctc 
ggcgcagcaa 
gccggccttc 
cgaaacgtgg 
gatcgaccag 
gcccgaccag 
cctgtggcgt 
ggtcgccgcc 
gcgcgggcgg 
cgaggccgag 
caacggcccc 
gacgcccgcg 
gccgcccacg 
cgcgccagac 
cacgcccgag 
ggcgccgcac 
cgggctgtcg 
ggaccgctcg 
cgcgcccgac 
cccacggcag 
gaccgcaggc 
ccacgcgccc 
caaggcggcg 
ccggcccgag 
gcccgaccag 
cctgcccgag 



ccgctgcgag 
gacgagaccc 
cggctgctgc 
tggggtagcc 
cacctccggc 
ggaggcatgg 
tcgacgccgg 
acggcgaccc 
aacctgcccc 
gcaacccgga 
gtccacgggg 
acggggtcca 
caggccgagc 
cggccggcgg 
catgtccggc 
ccgggcgggg 
agcgccgagg 
atcccaggcc 
gcgaccttcc 
ctcccggagg 
gacagcccca 
cgaggcttca 
acggccggac 
gcgtgctcgc 
tgcgaccaag 
cccccacgga 
gacggctacg 
cagcgcgccg 
ccgagcagcg 
ccctcgcaag 
acggcgccgg 
cccggggacc 
gcatctggct 
gcccagccgg 
gcgccacgca 
gcgcccgggc 
ccggcgcccg 
acggcgctgg 
agccccggcg 
gccaccgcga 
aagacgccgc 
ctgctcaccg 
cctgcgcccc 
cccctgcgcg 
accgcgtacg 
ccgcggggcg 
tgcgcggcgg 
ctgatgcagg 
gtggccgcct 
gacgccgccg 
gcgcgcggga 
gacccgacgc 
cgcccggcgg 
catcgggccc 
gccgcgggcg 
ccagcgcgcc 
gaacgcgagg 
cggaagggcg 
cgtgagcgcc 
cgccggccgc 
ccgaccggac 
gcgcccggcc 
cgggcgaccc 
gaggtcgagc 
ccggcgaccc 



gggtggtgca 
cgcccgagtc 
acggccggcc 
atagccaggg 
gcccgcaacc 
cggacgcgga 
cagcgccgcc 
ggacggaccg 
cggcgccggc 
accggcgcgg 
ccgtcgcccg 
acgagcaggg 
cggacgcgcg: 
agcatctgct 
cgccggcgcc 
tggaggaccc 
cgccggccga 
ggacctacgc 
cccgcacccc 
caagccggga 
ccggggcgct 
ccgacggagc 
ggccgccgct 
cacccccggc 
cgctggttgg 
cgcgcgcgcc 
cgcggggcga 
g.cgaccccac 
ggctgaccgt 
caggcgcgcc 
gcgacccgac 
gaccgccggc 
cggccagccc 
agccggggga 
aggcggtgcc 
cgagcggaac 
cggcgccggc 
acgccgcggc 
acgcggcgtc 
cgaccccgcg 
agggcgcggt 
gacagggcgc 
gggaggcgtt 
aggcgacgcg 
cgcagccggc 
cggagccgca 
gcgcgccctc 
cgccacccgc 
ccgcggcgcc 
cgatcgccgg 
cacgcacgaa 
cggaagacct 
cgccgaacgt 
ggcacgtgcg 
ccgccacgtt 
ccggggaagc 
cggccctcgc 
tgctccccga 
attggacgga 
cggccggtgc 
cacgccatca 
cctcccacgt 
agccgacagg 
cccacgccgc 
cggcggcgcc 



cgccgctggc 
ggtgctccgt 
cctcgacctg 
tgcgtacgcg 
gctgcctgcg 
ggctcatgca 
ggcgctccag 
ggcgcgcttc 
cgcagggcgc 
cctgcccgcc 
ggtgccgggc 
cctcgacccg 
gcttccgacg 
cgtcgatgca 
agacgagccc 
ggagtcccac 
ccggtgggac 
gaccaaaggc 
gccccgcgag 
agcgctcgag 
cgcgggcgcg 
ggcagggttg 
ctccctgggc 
cgcgctgcac 
cggggccaac 
ttcgcccgac 
ggggcgcgcc 
cctggcgctg 
acccaacgga 
cccggccgac 
cgaggcgcag 
gctgggggcc 
gctcaaggcc 
gctcaacccg 
gtgggggcgc 
caacgtgcat 
gcgaccggcg 
ggcacggccc 
cagcctggcg 
cgaggccctg 
gcgcggcaag 
gcaaacgccg 
cgaccggcgc 
ggctgcgccg 
cctctttgcg 
cgtactgctc 
gctcgaagat 
cggcggtgcc 
ccacgccgcc 
cgccgaggca 
gaggctcgcc 
ccagcgggtc 
caccggccac 
aagcgccgtg 
cgtcgaggct 
ggacgcggtc 
ggcgctcggg 
tggcgcgcgc 
cctcaccccg 
cgggctctgc 
gcccttcctc 
cgcggtgatc 
cgtggagttc 
gctcaccccc 
ggagaccgaa 



gtcagcgtca 
cccaaggtgg 
tncgcgctgt 
gcggccaacg 
ttgagcgtcg 
cgcccgagcg 
cgcccggcgg 
gcgccggcgc 
gacaccaccg 
gcggaagccc 
cccctcgacc 
ctgacggcgg 
acgccggcct 
ctgaagccgig 
atcgccatcg 
tggcagccat 
gcggcggact 
gcctccctgc 
gcgatgagcc 
agcgcgggca 
gggcccaacg 
tacggcggca 
ctgcacggcc 
ctcgcctgcc 
gcgctgcccg 
gggcggcgca 
gtggtggcgc 
atccggggaa 
cccgcccagc 
gccgactccg 
gcgctgagcg 
gccaaggcca 
gcgcctgcgc 
cacttgccgc 
ggcgcacgcc 
gccgcgctgg 
gagccggccg 
ccggcgcacc 
acgacgcgca 
cgaggcgcgc 
gccgtgcccc 
ggcacgggcc 
gtggcgcccc 
ggccccgccc 
ctggagcacg 
ggccacagca 
gcggcgaggc 
atggtagcca 
acggtgccga 
caggcgcccg 
gtcccccacg 
gccgcgacga 
gccgcaggcc 
cgcctcggcg 
ggcccgaagc 
ctcgcgccgc 
gcttggtacg 
cgcgtggccc 
cgaagcgccg 
atgcccggcg 
ggcgaccacc 
ctcagcatcg 
ccgaaggcca 
gaagccgccg 
cgccgacgga 



cgcgcccacc 
ccgggagccg 
ccccgccggg 
ctctccccga 
cgcggggtct 
acaccggggc 
agaccggcgc 
acaccgctcg 
cgccctcccc 
gcgtggctcc 
cgagcgcgcc 
cggagacccg 
tcgaccaccc 
aggatcgcag 
cgggagccgc 
cggccgaggg 
ggcacgaccc 
gcgatttgca 
tcgacccgca 
ccgccccgga 
agcaccacac 
ccgggaacac 
cgacgccggc 
agagcctgcg 
cgccggagac 
agacgctccc 
tcaagcggcc 
gcgcggcgaa 
aagcatcgct 
cggagcgtca 
aggcgcacgg 
acgccgcgca 
tgcggcacga 
ggaacacgct 
cgcgccgggc 
aggaggcacc 
tgccaccggc 
cgcccgcgca 
gcccgacgga 
cggacgccgc 
cacgcggcaa 
gcgggccgta 
tcgaccggga 
aggcggcgcg 
cgccggccgc 
ccggcgagcc 
cggcggccgc 
ccgcagcgtc 
ccgccgcggt 
cccccggcgc 
cgtcccaccc 
ccgcgcaccg 
ccgagatcgc 
acggggcaaa 
cggtcctgct 
cgctacgcgc 
cctggggggg 
cgcccatgta 
cgcccgcagg 
ccgcgntgca 
ccgcgtttgg 
ccgccgagcg 
tcgcgacgga 
gggacggcta 
cgacccacgc 



30000 
30060 
30120 
30180 
30240 
30300 
30360 
30420 
30480 
30540 
30600 
30660 
30720 
30780 
30840 
30900 
30960 
31020 
31080 
31140 
31200 
31260 
31320 
31380 
31440 
31500 
31560 
31620 
31680 
31740 
31800 
31860 
31920 
31980 
32040 
32100 
32160 
32220 
32280 
32340 
32400 
32460 
32520 
32580 
32640 
32700 
32760 
32820 
32880 
32940 
33000 
33060 
33120 
33180 
33240 
33300 
33360 
33420 
33480 
33540 
33600 
33660 
33720 
33780 
33840 
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ccgcggtcgg gtgcagccga cagacggcgc gcccggcgcg ttgccgcgcc tcgaggtgct 33900 
ggaggaccgc gcgatccagc ccctcgactt cgccggatcc ctcgacaggt tatcggcggt 33960 
gcggatcggc tggggtccgc tttggcgatg gctgcaggac gggcgcgtcg gcgacgaggc 34020 
ctcgctcgcc acccccgtgc cgacccatcc gaacgcccac gacgtggcgc ccttgcaccc 34080 
gatcctgctg gacaacggct ttgcggtgag cctgctgtca acccggagcg agccggagga 34140 
cgacgggacg cccccgctgc cgttcgccgt ggaacgggtg cggtggtggc gggcgccggt 34200 
tggaagggtg cggtgcggcg gcgtgccgcg gtcgcaggca ttcggtgcct cgagcttcgt 34260 
gcnggccgac gaaactggcg aggtggccgc cgaggtggag ggatttgttt gccgccgggc 34320 
gccgcgagag gtgctcctgc ggcaggagtc gggcgcgtcg accgcagccc tgtaccgcct 34380 
cgactggccc gaagcgccct tgcccgatgc gcctgcggaa cggatcgagg agagccgggt 34440 
cgtggcggca gcacctggct cggagatggc cgcggcgctc gcaacacggc tcaaccgccg 34500 
cgtcctcgcc gaacccaaag gcctcgaggc ggccctcgcg ggggtgtctc ccgcaggcgt 34560 
gatctgcctc tgggaggctg gagcccacga ggaagctccg gcggcggcgic agcgcgcggc 34620 
gaccgagggc ctctcggtgg tgcaggcgct cagggaccgc gcggtgcgcc tgtggtgggc 34680 
gaccatgggc gcagtggccg tcgaggccgg tgagcgggcg caggtcgcca cagcgccggc 34740 
atggggcccc ggccggacag tgatgcagga gcgcccggag ctcagctgca ctctggcgga 34800 
ttcggagccg gaggccgacg cagcgcgccc agctgacgtt ctgttgcggg agctcggtcg 34860 
cgctgacgac gagacacagg tggctttccg ttccggaaag cgccgcgtag cgcggccggc 34920 
caaagcgacg acccccgaag ggctcctggc ccccgacgca gagtcctacc gactggaggc 34980 
tgggcagaag ggcacattgg accagccccg ccccgcgccg gcacagcgcc gggcacctgg 35040 
cccgggcgag gtcgagatca aggcaaccgc cccggggctc aacttccgga ccgtccccgc 35100 
tgcgcnggga atgtatccgg gcgacgccgg gccgatgggc ggagattgtg ccggtgtcgc 35160 
cacggcggtg ggccaggggg tgcgccacgc cgcggccggc gatgctgtca tgacgctggg 35220 
gacgctgcac cgatccgtca cggtcgacgc gcggctggtg gtccggcagc ctgcagggct 35280 
gacccccgcg caggcagcta cggtgccggt cgcgttcctg acggcctggc tcgctctgca 35340 
cgacccgggg aacctgcggc gcggcgagcg ggcgctgatc catgctgcgg ccggcggcgt 35400 
gggcacggcc gcggtgcaaa tcgcccgatg gataggggcc gaggtgttcg ccacggcgag 35460 
cccgtccaag tgggcagcgg cccaggccac gggcgtgccg cgcacgcaca tcgccagctc 35520 
gcggacgccg gagtctgctg agacgctccg gcaggtcacc ggcggccggg gcgtggacgt 35580 
ggcgcccaac gcgctggccg gcgagttcgt ggacgcgagc ctgtccctgc tgtcgacggg 35640 
cgggcggttc cccgagacgg gcaagaccga catacgggac cgagccgcgg tcgcggcggc 35700 
gcatcccggt gcccgctacc gggtattcga catcctggag ctcgctccgg atcgaacccg 35760 
agagaccccc gagcgcgcgg tcgagggcct tgctgcggga catctgcgcg cattgccggc 35820 
gcatgcgrttc gcgatcacca aggccgaggc agcgtttcgg ttcatggcgc aagcgcggca 35880 
tcagggcaag gtcgcgctgc tgccggcgcc ctccgcagcg cccttggcgc cgacgggcac 35940 
cgtaccgccg accggcgggc tgggagcgtt ggggctccac gtggcccgct ggctcgccca 36000 
gcagggcgtg ccgcacatgg cgctcacagg tcggcggggc ccggacacgc cgggcgctgc 36060 
caaagccgtc gcggagaccg aagcgcccgg cgctcgggtg acgatcgcgg cgtcggatgt 36120 
cgccgatcgg aacgcgctgg aggccgtgcc ccaggccatt ccggcggagt ggccgtcaca 36180 
gggcgtgacc cacgcagccg gfagcgcLcga cgacggcgcg cttgatgagc agaccaccga 36240 
ccgcttctcg cgggtgctgg caccgaaggt gaccggcgcc tggaatctgc atgagctcac 36300 
ggcgggcaac gatctcgctt tcttcgtgct gttctcctcc atgtcggggc tctcgggccc 36360 
ggccgggcag tccaactatg cggcggccaa caccttcctc gacgcgctgg ccgcgcatcg 36420 
gcgggccgaa ggcctggcgg cgcagagcct cgcgtggggc ccatggtcgg acggaggcac 36480 
ggcagcgggg ctcagcgcgg cgccgcaggc gcggctcgct cggcacggga tgggagctct 36540 
gtcgccggct cagggcaccg cgctgctcgg gcaggcgctg gctcggccgg aaacgcagcc 36600 
cggggcgatg tcgctcgacg tgcgtgcggc aagccaagct tcgggagcgg cagcgccgcc 36660 
tgtgtggcgc gcgtcggcgc gcgcggaggc gcgccatacg gcggctgggg cgcagggggc 36720 
actggccgcg cgcctcgggg cgctgcccga ggcgcgtcgc gccgacgagg tgcgcaaggt 36780 
cgtgcaggcc gagatcgcgc gcgtgctctc atggagcgcc gcgagcgccg tgcccgtcga 36840 
tcggccgctg ccggactcgg gcctcgactc gctcacggcg gcggagctgc gcaacgtgcc 36900 
cggccagcgg gcgggcgcga cgctgccggc gacgctggca ttcgaccacc cgacggccga 36960 
cgcgcccacg cgccggccgc ccgataaggc cctggccgtg gccgagccga gcgcatcgtc 37020 
cgcaaagtcg tcgccgcagg tcgccctcga cgagcccact gccatcatcg gcatcggccg 37080 
ccgcttccca ggcggcgtgg ccgatccgga gtcgtcttgg cggccgctcg aagagggcag 37140 
cgatgccgtc gtcgaggtgc cgcacgagcg atgggacatc gacgcgttct atgatccgga 37200 
tccggatgtg cgcggcaaga tgacgacacg ctttggcggc ttcctgtccg acatcgaccg 37260 
gctcgacccg gccttcttcg gcatctcgcc gcgcgaagcg acgaccatgg atccgcagca 37320 
gcggctgccc ctggagacga gctgggaggc gttcgagcgc gccgggattt tgcccgagcg 37380 
gctgatgggc agcgacaccg gcgtgctcgc ggggcccttc taccaggagt acgctgcgct 37440 
cgccggcggc atcgaggcgr tcgatggcta tctaggcacc ggcaccacgg ccagcgtcgc 37500 
cccgggcagg atcccttatg tgctcgggct aaaggggccg agcctgacgg tggacaccgc 37560 
gtgctcctcg tcgctggtcg cggtgcacct ggcctgccag gcgctgcggc ggggcgagcg 37620 
ttcggcggcg ccggccggcg gcgtggcgct gatgcccacg ccggcgacgt tcgtggagct 37680 
cagccggccg cgaggcctgg ctcccgacgg acggcgcaag agcctctcgg ccgcagccga 37740 
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cggcgtgggg tggagcgaag gctgcgccat gctcctgctc aaaccgcctc gcgatgcgca 37800 
gcgcgatggg gacccgatcc tggcggtgat ccgcggcacc gcggcgaacc aggatgggcg 37860 
cagcaacggg ctgacggcgc ccaacgggtc gtcgcagcaa gaggtgatcc gtcgggcccc 37920 
ggagcaggcg gggctggctc cggcggacgc cagctacgtc gagtgccacg gcaccggcac 37980 
gacgttgggc gaccccatcg aagtgcaggc cctgggcgcc gcgctggcac aggggcgacc 38040 
ctcggaccgg ccgctcgtga tcgggtcggt gaagtccaat accggacata cgcaggctgc 38100 
ggcgggcgtg gccggtgtca tcaaggtggc gctggcgctc gagcgcgggc ttatcccgag 38160 
gagcctgcat ttcgacgcgc ccaatccgca cattccgtgg tcggagcccg ccgtgcaggc 38220 
ggccgccaaa cccgtcgaac ggacgagaaa cggcgtgccg cgacgagccg gggtgagctc 38280 
gtttggcgtc agcgggacca acgcgcacgt ggtgctggag gaggcgccag cggcggcgtt 38340 
cgcgcccgcg gcggcgcgtt cagcggagct cctcgtgctg tcggcgaaga gcgccgcggc 38400 
gctggacgcg caggcggcgc ggcccccggc gcacgtcgtc gcgcacccgg ,agctcggcct 38460 
cggcgacctg gcgttcagcc cggcgacgac ccgcagcccg atgacgcacc ggctcgcggt 38520 
ggcggcgacc tcgcgcgagg cgctgtctgc cgcgctcgac acagcggcgc aggggcaggc 38580 
gccgcccgca gcggcccgcg gccacgcttc cacaggcagc gccccaaagg tggttttcgt 38640 
ctttcctggc cagggctccc agtggctggg catgggccaa aagcccccct cggaggagcc 38700 
cgtcttccgc gacgcgctct cggcgcgtga ccgagcgatt caggccgaag. ccggctggcc 38760 
gccgcccgcc gagctcgcgg ccgacgagac cacctcgcag ctcggccgca tcgacgcggc 38820 
gcagccggcg ctgcccgcga tcgaggtcgc gctgtcggcg ctgtggcggc cgtggggcgt 38880 
cgagccggat gcageggtag gccacagcac gggcgaagtg gcggccgcgc acgtcgccgg 38940 
cgccctgtcg ctcgaggatg ctgcagcgat catctgccgg cgcagcccgc cgctgcggcg 39000 
gatcagcggc caaggcgaga tggcggtcgt cgagccttcc ctggccgagg ccgaggcagc 3 90 60 
gctcctgggc tacgaagacc ggctcagcgt ggcggtgagc aacagcccgc gctcgacggt 39120 
gctggcgggc gagccggcag cgcccgcaga ggtgctggcg atccttgcgg caaagggggt 39180 
gctctgccgt cgagtcaagg tggacgtcgc cagccacagc ccacagatcg acccgccgcg 39240 
cgacgagcta ttggcagcat tgggcgagct cgagccgcga caagcgaccg tgtcgatgcg 39300 
ctcgacggcg acgagcacga tcacggcggg cccggagccc gcggcgagct accgggcgga 39360 
caacgttcga cagccggtgc gcttcgccga agcggtgcaa tcgccgacgg aagacggtca 39420 
• cgggctgtcc gtggagacga gcccgcaccc . gaccctgacg acatcggtcg aggagatccg 39480 
acgggcgacg aagcgggagg gagccgcggt gggctcgtxg cggcgtggac aggacgagcg 39540 
cctgcccacg ctggaggcgc tgggagcgcc ccgggtacac ggccaggcgg cgggctggga 39600 
gcggctgtxc tccgcgggcg gcgcgggcct ccgtcgcgtg ccgctgccga cctatccctg 39660 
gcagcgcgag cggcaccggg ccgacgcgcc gaccggcggc gcggcgggcg gcagccgctt 39720 
tgctcatgcg ggcagccacc cgctcccggg tgaaatgcag accctgtcga cccagaggag 39780 
cacgcgcgtg tgggagacga cgctggatct caaacggctg ccgtggctcg gcgatcaccg 39840 
ggcgcagggg gcggtcgtgt tcccgggcgc ggcgtacctg gagatggcgc cttcgtccgg 39900 
ggccgaggcc ttgggcgacg gtccgctcca ggtcagcgac gcggtgctcg ccgaggcgct. 39960 
ggccttcgcg gatgatacgc cggcggcggt gcaggtcatg gcgaccgagg agcgaccagg 40020 
ccgcccgcaa ttccacgctg caagccgggt gccgggccac ggcggcgctg cctttcgaag 40080 
ccatgcccgc ggggcgccgc gccagatcga gcgcgccgag gccccggcga ggctggatct 40140 
ggccgcgctt cgcgcccggc ttcaggccag cgcacccgct gcggccacct atgcggcgcc 40200 
ggccgagacg gggctcgagt acggcccagc gctccagggg cctgtcgagc tgcggcgggg 40260 
ggagggcgag gcgctgggac gtgcgcggct ccccgaggcc gccggctccc cagccgcgtg 40320 
ccggccccac cccgcgctct tggatgcgtg cttccacgtg agcagcgcct tcgctgaccg 40380 
cggcgaggcg acgccacggg tacccgtgga aatcggctcg ccgcggcggt tccagcggcc 40440 
gtcgggggag ctgtggtgtc atgcgcggag tgtgagccac ggaaagccaa cacccgaccg 40500 
gcggagtacc gacttctggg tggccgacag cacgggcgcg accgtcgccg agatctccgg 40560 
gctcgtggcg cagcggctcg cgggaggtgc acgccggcgc gaagaagacg accggctcat 40620 
ggagccggct cgggaaccga ccgcggtccc cggatccgag gncatggcgg gccggcggct 40680 
gcccatcggc tcgggcggcg ggctcggcgc tgcgctccac tcggcgctga cggaagctgg 40740 
ccactccgtc gcccacgcga cagggcgcgg cacgagcgcc gccgggtcgc aggcactctt 40800 
gacggcgtcc ttcgacggcc aggccccgac gtcggtggcg cacctcggca gcctcgacga 40860 
gcgcggcgcg ctcgacgcgg acgccccctt cgacgccgac gcgctcgagg agtcgctggt 40920 
gcgcggctgc gacagcg^gc cctggaccgc gcaggccgtg gccggggcgg gcttccgaga 40980 
tcctccgcgg ccgtggctcg cgacacgcgg cgctcaggcc accggcgccg gcgacgtctc 41040 
tgtggcgcaa gcgccgcccc tggggccggg ccgcgctatc gccttggagc acgccgagct 41100 
gcgccgcgct cggaccgacc tcgacccagc gcggcgcgac ggagaagtcg atgagctgct 41160 
tgccgagccg ccggccgacg acgccgagga ggaagrcgcg. ccccgcggcg gtgagcggcg 41220 
cgcggcccgg ctcgcccgaa ggctgcccga gaccgactgc cgagagaaaa tcgagcccgc 41280 
ggaaggccgg ccgtcccggc tggfagatcga tgggtccggc gtgcccgacg acctggtgct 41340 
ccgagccacg gagcggcgcc ctcctggccc gggcgaggtc gagatcgccg tcgaggcggc 41400 
ggggctcaac tctcccgacg cgacgagggc catggggatc taccctgggc ccggggacgg 41460 
tccggttgcg ccgggcgccg agcgccccgg ccgaattgtc gcgatgggcg aaggtgtcga 41520 
gagccctcgt accggccagg acgccgcggc cgtcgcgccc ttcagcctcg gcacccacgt 41580 
caccaccgac gcccggacgc ccgcaccccg ccccgcggcg ctgacggccg cgcaggcagc 41640 
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cgcgctgccc gtcgcattca tgacggcctg gtacggtctc gtccatctgg ggaggctccg 41700 
* ggccggcgag cgcgtgctca tccactcggc gacggggggc accgggctcg ctgctgtgca 41760 
gatcgcccgc cacctcggcg cggagatatt tgcgaccgct ggtacaccgg agaagcgggc 41820 
gtggccgcgc gagcagggga tcgcgcacgt gacggactcg cggccgctgg acntcgccga 41880 
gcaagcgccg gccgcgacga agggcgaggg ggtcgacgtc gtgttgaact cgctgtctgg 41940 
cgccgcgatc gacgcgagcc tttcgaccct cgtgccggac ggccgcctca tcgagctcgg 42000 
caagacggac atctatgcag atcgcccgcc ggggctcgct cacttcagga agagcccgtc 42060 
ccacagcgcc gtcgatcttg cgggcttggc cgtgcgtcgg cccgagcgcg tcgcagcgct 42120 
gctggcggag gtggtggacc tgctcgcacg gggagcgccg cagccgcttc cggcagagat 42180 
cccccccctc tcgcgggccg cggacgcgtt ccggaaaatg gcgcaagcgc agcatctcgg 42240 
gaagctcgcg ctcgcgctgg aggacccgga cgtgcggatc cgcgttccgg gcgaacccgg 42300 
cgccgccacc cgcgcggacg gcgcctacct cgtgaccggc ggtctggggg ggcccggcct 42360 
gagcgtggct ggatggctgg ccgagcaggg ggctgggcat ctggtgctgg tgggccgctc 42420 
cggcgcggtg agcgcggagc agcagacggc tgccgccgcg ctcgaggcgc acggcgcgcg 42480 
tgtcacggta gcgagggcag acgtcgccga tcgggcgcag atggagcgga tcccccgcga 42540 
ggctaccgcg tcggggatgc cgctccgcgg cgtcgctcat gcggccggaa tcctggacga 42600 
cgggctgctg atgcagcaaa cccccgcgcg gttccgcgcg gtcatggcgc ccaaggtccg 42660 
aggggccttg cacctgcatg cgctgacacg cgaagcgccg ctctccttcc tcgtgctgta 42720 
cgcctcggga gcagggctct tgggctcgcc gggccagggc aactacgccg cggccaacac 42780 
gccccccgac gcactggcac accaccggag ggcgcagggg ctgccagcat tgagcatcga 42840 
ctggggcctg ttcgcggacg tgggcttggc cgccgggcag caaaatcgcg gcgcacggct 42900 
ggccacccgc gggacgcgga gcctcacccc cgacgaaggg ctgtgggcgc tcgagcgcct 42960 
gctcgacggc gatcgcaccc aggccggggt catgccgttc gacgtgcggc agcgggcgga 43020 
gttctacccg gcggcggcat cttcgcggag gttgtcgcgg ctcatgacgg cacggcgcgc 43080 
ggctcccggt cggctcgccg gggatcggga cctgctcgaa cggcucgcca ccgccgaggc 43140 
gggcgcgcgg gcagggatgc tgcaggaggt cgcgcgcgcg caggtctcgc aggcgccgcg 43200 
cctcnccgaa ggcaagctcg acgtggacgc gccgcccacg agcctgggaa tggactcgct 43260 
gatggggcca gagctgcgca accgcatcga ggccgcgctc ggcatcacca tgccggcgac 43320 
cccgctgtgg acctacecca cggcggcagc gctgagtgcg catctggctt ctcacgtcgc 43380 
ccctacgggg gatggggaat ccgcgcgccc gccggacaca gggagcgtgg ccccaacgac 43440 
ccacgaagtc gcttcgctcg acgaagacgg gctgctcgcg ccgattgacg agccacccgc 43500 
gcgcgcggga aagaggcgat tgcgtgacag accgagaagg ccagcccctg gagcgcttgc 43560 
gcgaggccac tctggccctt cgcaagacgc tgaacgagcg cgataccccg gagctcgaga 43620 
agaccgagcc gaccgccatc gtggggatcg gctgccgctt ccccggcgga gcgggcactc 43680 
cggaggcgtt ctgggagctg ctcgacgacg ggcgcgacgc gatccggccg ctcgaggagc 43740 
gctgggcgcc cgtaggtgtc gacccaggcg acgacgcacc gcgctgggcg gggctgccca 43800 
ccgaggccat cgacggcttc gacgccgcgc tcttcggcat cgccccccgg gaggcacggt 43860 
cgctcgaccc gcagcatcgc ctgctgctgg aggccgcctg ggaggggtcc gaagacgccg 43920 
gcaccccgcc caggtccctc gtcgggagcc gcaccggcgt gctcgccggc gcccgcgcca 43980 
cggagtacct ccacgccgcc gtcgcgcacc agccgcgcga agagcgggac gcgtacagca 44040 
ccaccggcaa cacgcccagc atcgccgccg gacggctatc gcacacgctg gggctgcagg 44100 
gaccccgcct gaccgtcgat acggcgtgct cgtcatcgct ggrggccatt caccccgcct 44160 
gccgcagcct gcgcgctcga gagagcgatc tcgcgctggc gggaggggtc aacatgcttc 44220 
tctcccccga cacgatgcga gctctggcgc gcacccaggc gccgccgccc aacggccgtt 44280 
gccagacctt cgacgcgtcg gccaacgggt rcgtccgtgg ggagggctgc ggcctgatcg 44340 
tgctcaagcg attgagcgac gcgcggcggg acggggaccg gacccgggcg ctgatccgag 44400 
gatcggccat caatcaggac ggccggccga cggggtcgac ggcgcccaac gcgctcgccc 44460 
agggggcgct cttgcgcgag gcgccgcgga acgccggcgc cgaggccgag gccatcggtt 44520 
acatcgagac ccacggggcg gcaacctcgc tgggcgaccc caccgagatc gaagcgccgc 44580 
gcgctgtggt ggggccggcg cgagccgacg gagcgcgctg cgtgccgggc gcggcgaaga 44640 
ccaacctcgg ccacctggag ggcgccgccg gcgcggcggg cccgaccaag gcgacgcctt 44700 
cgccacacca cgagcgcatc ccgaggaacc tcaaccttcg tacgctcaac ccgcggaccc 44760 
ggaccgaggg gaccgcgctc gcgttggcga ccgaaccggt gccctggccg cggacgggec 44820 
ggacgcgctt cgcgggagtg agctcgtccg ggacgagcgg gaccaacgcg cacgtggtgt 44880 
cggaggaggc gccggcggtg gagcctgagg ccgcggcccc cgagcgcgca gcggagctgt 44940 
tcgtcctgcc ggcgaagagc gcggcggcgc tggatgcgca ggcagcccgg ccgcgggacc 45000 
acctggagaa gcacgtcgag cctggcctcg gcgatgtggc gttcagcctg gcgacgacgc 45060 
gcagcgcgat ggagcaccgg ccggcggtgg ccgcgagctc gcgcgaggcg ccgcgagggg 45120 
cgctttcggc cgcagcgcag gggcacacgc cgccgggagc cgtgcgtggg cgggcctcgg 45180 
gcggcagcgc gccgaaggtg gtcttcgcgc ttcccggtca gggctcgcag. tgggtgggca 45240 
cgggccgaaa gctcacggcc gaagagccgg tcttccgggc ggcgctggag ggctgcgacc 45300 
gggccatcga ggcggaagcg ggctggccgc tgctcgggga gctctccgcc gacgaggccg 45360 
ccccgcagct cgggcgcatc gacgcggccc agccggtgct cttcgccatg gaagtagcgc 45420 
crcccgcgcc gtggcggtcg tggggagcgg agccggaagc ggtggtgggc cacagcacgg 45480 
gcgaggttgc ggcggcgcac gcggccggcg cgctgtcgct cgaggacgcg gcggcgacca 45540 
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tctgccggcg 
agccgtcgct 
cggcgagcaa 
tgctggcggc 
gccatagccc 
ggccgcgagc 
cggagctcgg 
cggcgcaagc 
tcctggtgcc 
gctcgctgcg 
gggcgtccgg 
cgctgccgac 
gccgcctcgc 
tgccccgcgc 
ggggtggggt 
tgcttcatgc 
gccgaaacga 
gggcatcggc 
tggttcgatt 
catgcacggt 
cgcgcgccgc 
agaagagccc 
atcaactggc 
agggcgacgc 
tgggtggcct 
tgctcaccag 
aggcccgcgc 
tggcagcggt 
ccccgttgcg 
cggacgaggc 
accggctgct 
tgtggggtgg 
cgcaccatcg 
agggaggcat 
tggccacggg 
gttcggtcac 
gcaactxgct 
cggcaaaccg 
tcgctcgcgg 
gccgaggctc 
ttcagcgcga 
. agcggctggt 
ggcacacccg 
tcccaggtgg 
tcagcaccga 
aggttccggg 
atgcggcgtt 
tgttgctgga. 
gcgagagcgc 
agggcctcga 
ccgctggacg 
cctgctcgtc 
gcgaccaggc 
cgtcgcgcat 
acggctttgc 
agcgcgaccg 
cgagcagcgg 
tggcgcaagc 
cagcgctggg 
ccgcggagcg 
cggcgggcct 
ctcaaccgga 
tcgtccgcag 
ctttcggcct 
ctgtggccgc 



cagccggctg 
ggaggaggcc 
cagcccgcgc 
gccgacggcc 
gcaggccgac 
ggctgcggtg 
cgcgagctac 
gctgctggag 
gcccctggac 
gcgagggcag 
ctatccggtg 
ctatccctgg 
cgcagccgac 
cgccccgaaa 
cggtgaggcg 
gccggccgac 
ccggcaggga 
cgacgaagtc 
cctgagcgct 
gggcggcgag 
ggcgctggag 
gacggagatc 
gttccgcagc 
cgcaccgata 
tggcccgctc 
ccggcacggg 
gcgcatcgca 
ggatgtcgcc 
cggggtggcg 
cctgctggag 
gcgcgaccgg 
caaaggccaa 
ccgcgcgcac 
ggttgatgca 
gccggccttg 
acggatggac 
ttcggctctg 
gatctggcgc 
catcgtcgcc 
cgccgagcag 
gctgggcgaa 
ggcgcatctc 
gccggcggcg 
ggacgagggc 
ggcgccagcc 
ccggacctac 
ccccgccact 
ggtgagccgg 
cacgggcgcg 
cgacgacgcg 
gccgtcgtxc 
gtcgctggtg 
cctggccggc 
gcgtttgctt 
gcgggccgag 
cgaccccatc 
gctcacggtg 
gggcgtggcg 
tgacccgatc 
gccgctctgg 
ggccggcgcg 
gctcgacgag 
ggcggtcccc 
gagcgggacc 
ggcccccgag 



ctgcggcgga 
gaggcggcgc 
tcgaccgtgc 
aagggggcgt 
ccgctgcgcg 
ccgatgcgcc 
cgggcggaca 
ggtggccccg 
gagatccaga 
gacgagcgcg 
agctgggctc 
cagcacgagc 
cccaccaagg 
tcggagacag 
gccgctgcag 
gcctccaccg 
gtcctctacc 
agcgaggcta 
gcgccccatc 
ccagaggcct 
caccccgctg 
gagcccctgg 
ggtcgcaggc 
tcgccgtccg 
gcggctcggt 
ctgccagagc 
gcggtcgagg 
graggccgacc 
cacgccgccg 
tcggtgctcc 
cctcccgacc 
ggcgcatacg 
tcgctgccgg 
aaggctcatg 
tcggcgctgg 
tgggcgcgct 
gtcgcggagg 
ggcctgtccg 
cgggtgccgg 
gggctcgact 
cggctgtcgg 
ctcaccgacg 
gcggatgacg 
ctggagacat 
gaccggtggc 
gtggccaagg 
tcccctcgcg 
gaggcgatcg 
ttcgtgggca 
gcgttgctgt 
ttcctgggtc 
gcgttgcacc 
gggtccagcg 
tcgccagatg 
ggctgcgccg 
ctggcggtgg 
cccagcggtc 
ccggccgagg 
gaggtgcagg 
ctgggcgctg 
ctcaaggtgc 
ctcaacccgc 
tggccgcgcg 
aacgcgcacg 
cgcgcagcgg 



tcagcggtca 
tgcgtggcca 
tcgccggcga 
tctggcggca 
aagagctgat 
cgacggtgac 
accctcggca 
cgccgcccat 
cggcggccga 
cgacgctgcc 
ggccgttccc 
ggtgctggat 
actggttcta 
ctcatgggag 
cgctgtcgac 
tcgccgagca 
tgtggggcct 
cccgccgtgc 
ctcctcgctt 
ctctttgcca 
cccggggtgg 
tggccgagct 
acgcagcacg 
cggaggggag 
ggctggcgga 
gacaggcgcc 
ggctggaagc 
ccatgacggc 
gcgtcttccc 
gccccaaggt 
tgttcgcgcc 
ccgcggccaa 
cgttgagcct 
cacgtctgag 
agcgcctggc 
ccgcgccggt 
acgagcgcgc 
ttgcggagag 
gcctctccga 
ccctgatggc 
cgactctggc 
tgccgaagct 
acatcgccat 
actggcggca 
gcgcggcgga 
gtgccttcct 
aggcgatgag 
agcgcgctgg 
tgatcgggag 
acggcaccac 
tgcacggccc 
tcgcctgcca 
tgcttttgtc 
ggcggtgcaa 
tggtggtgct 
tcaggagcac 
ctgcccagca 
tcgatttcgr 
cgccgggcgc 
tcaaggccaa 
ccttggcgct 
acatcccgcg 
gcgcgcgccc 
cggcgctgga 
agctgctcgt 



gggggagatg 
tgagggtcgg 
gccggcggcg 
ggtgaaggcg 
cgcggcgctg 
gggcggggcg 
gccggcgcgc 
cgagacgagc 
gcaagggggc 
ggaggcgctg 
cgcgggcggc 
cgaggtcgag 
ccgaacggac 
ctggctgctg 
gcgcggactt 
ggtacccgaa 
cgacgccgtc 
caccgcaccc 
ccgggcggtg 
agcggcgccg 
cctcgtggac 
gccctcgccg 
ccttgtagcc 
ccacctggtg 
gcggggagct 
gggcggagag 
gcagggcgcg 
gctgccggcc 
cgcgcgtcac 
ggccgggagc 
gttctcgtcg 
tgcgttcctc 
cgcccggggc 
cgacatcggg 
gaacaccagc 
ctatgccgcg 
cgcgcctccc 
ccgcccagcc 
cccgggcgcg 
cccggagacc 
cctcgaccac 
ggaggaccgg 
cgtcggtgcc 
tctggccgag 
ctggcacgac 
ccgcgatgtg 
cctggacccg 
ccaggacccg 
cgagcacgcc 
cggcaacctg 
gacgacgacg 
gagcctgcga 
gccgcggtca 
gacgttctcg 
caagcggctc 
ggcgancaac 
ggcgttgcta 
ggagtgccac 
ggrcgtacggg 
cctcggccac 
ggagcacgag 
ggcagagctg 
gcgtcgtgca 
ggaggcgccg 
cctgtcggcg 



gcgccggtcg 
ctgagcgcgg 
cpcccggagg 
gacgccgcca 
ggagcgatcc 
atcgcgggtc 
ttcgccgcgg 
ccgcacccga 
gctgcggcgg 
gggacgctgc 
aggcgggctc 
cctgacgccc 
tggcccgagg 
tcggccgaca 
tcccgcaccg 
gccgccagtc 
gccgacgctg 
gtcctcgggc 
acccgcgggg 
cggggcctcg 
ccggatcctc 
gacgccgagg 
gccccgccgg 
acgggcgggc 
cgacacccgg 
cagccgccgg 
cgggcgaccg 
gccaccgagc 
ctggcggaga 
tggctgctgc 
ggcgcggcgg 
gacgggctcg 
ctatgggccg 
gtcctgccca 
gctgcccagc 
cgagggcggc 
ccggcgccga 
ccccacgagc 
ctcgacgtcg 
cgtaaccgcc 
ccgacggtgg 
agcgacaccc 
gcctgccggc 
ggcatggtgg 
cccgatccgg 
cgcagctcgg 
caacagcggc 
acggcgctgc 
gagcgggtgc 
ctcagcgtcg 
gtggacaccg 
ttgggcgagt 
ttcgccgcgg 
gccgctgcag 
cgtgacgcgc. 
cacgatggcc 
cgccaggcgc 
gggacgggga 
cggggccgcc 
ctggaggccg 
cagattccgg 
ccagtggccg 
ggcgtgagcg 
gcggtggagc 
aagagcgcgg 



45600 
4S660 
45720 
45780 
45840 
45900 
45960 
46020 
46080 
46140 
46200 
46260 
46320 
46380 
46440 
46500 
46560 
46620 
46680 
46740 
46800 
46860 
46920 
46980 
47040 
47100 
47160 
47220 
47280 
47340 
47400 
47460 
47520 
47580 
47640 
47700 
47760 
47820 
47880 
47940 
48000 
48060 
48120 
48180 
48240 
48300 
48360 
48420 
48480 
48540 
48600 
48660 
48720 
48780 
48840 
48900 
48960 
49020 
49080 
49140 
49200 
49260 
49320 
49380 
49440 
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cggcgctgga tgcgcaggca gcccggctgc gggaccaqct ggagaagcat gtcgagcttg 49500 
gcctcggcga tgtggcgttc agcccggcga cgacgcgcag cgcgatggag caccggctgg 49560 
cggtggccgc gagctcgcgc gaggcgctgc gaggggcgct ttcggccgca gcgcaggggc 49620 
acacgccgcc gggagccgtg cgtgggcggg cctcgggcgg cagcgcgccg aaggtiggtct 49680 
tcgcgtttcc cggccagggc tcgcagtggg tgggcatggg ccgaaagctc atggccgaag 49740 
agccggtctt ccgggcggcg ctggagggtt gcgaccgggc catcgaggcg gaagcgggct 49800 
ggccgctgct cggggagctc tccgccgacg aggccgcctc gcagctcggg cgcatcgacg 49860 
tggttcagcc ggtgctgttc gccatggaag tagcgctttc tgcgctgtgg cggtcgtggg 49920 
gagtggagcc ggaagcggtg gtgggccaca gcatgggcga ggttgcggcg gcgcacgtgg 49980 
ccggcgcgct gtcgctcgag gacgcggcgg cgatcatctg ccggcgcagc cggctgctgc 50040 
ggcggatcag egg tcagggg gaga tggege tggtcgagct gtcgctggag gaggecgagg 50100 
cggcgctgcg tggecatgag ggteggcega gcgtggcggt gagcaacagc ccgcgctcga 50160 
ccgtgctcgc cggcgagccg gcggcgctct eggaggtgee ggcggcgctg aeggecaagg 50220 
gggtgttctg gcggcaggtg aaggtggacg tcgccagcca tagcccgcag gtcgacccgc 50280 
cgegegaaga getgatcgeg gcgctgggag cgatccggcc gegagegget gcggtgccga 50340 
tgcgctcgac ggtgacgggc ggggtgatcg egggtcegga gctcggtgcg agccaccggg 50400 
cggacaacct tcggcagccg gtgegctteg ctgcggcggc gcaagcgccg ctggagggcg 50460 
gccccgcgct gttcatcgag atgagcccgc acccgatcct ggtgccgccc ceggacgaga 50520 
tccagacggc ggccgagcaa gggggcgctg cggtgggccc getgeggega gggcaggacg 50580 
agcgcgcgac getgetggag gcgctgggga cgctgtgggc gtceggctat ccggcgagct 50640 
gggcccggct gttccccgcg ggeggcagge gggttccgct gccgacctat ceccggcagc 50700 
aegageggea ceggatcgag gacagcgtgc aegggtcgaa gccctcgccg cggctccggc 50760 
agcctcgcaa cggcgccacg gaccacccgc tgetegggge tccattgctc gcctcggcgc 50820 
gacccggagc teacttgegg gagcaagege tgagegaega gaggctatcc taccttccgg 50880 
aacacagggt ccacggcgaa gccgtgttgc ccagcgcggc gtatgtagag atggcgctcg 50940 
ccgccggcgt agatctctat ggcaeggega cgctggcgct ggagcagctg gcgcccgagc 51000 
gagccctcgc cgtgccctcc gaaggeggae gcaccgtgca agtggccccc agegaagaag 51060 
gccccggtcg ggcctcattc caggtatcga gtcgtgagga ggcaggcagg agctgggtgc 51120 
■ggcacgccac ggggcacgtg tgtageggee agagctcagc ggtgggagcg ctgaaggaag 51180 
ccccgcggga gattcaaegg cgatgcccga gcgtcctgtc gtcggaggcg ctictatccgc 51240 
cgctcaacga gcacgccctc gactatggtc cctgcttcca gggegeggag caggcgtggc 51300 
ccggcacggg ggaggtgctc ggcegggtae gettgecagg agacaeggea tcctcaagtg 51360 
gcgcctaccg gactcatccc gccttgttgg atgcatgttc tcaggtgctg acagcgctgc 51420 
tcaccacgcc ggaatccatc gagactegga ggeggctgae ggacctccac gaaceggate 51480 
tcccgcggtc cagggctccg gtgaatcaag cggtgagtga cacctggctg tgggacgccg 51540 
cgctggacgg tggaeggege cagagegega gcgtgcccgt cgacctggtg ctcggcagct 51600 
tecatgegaa gtgggaggtc atggagcgcc tcgcgcaggc gtacatcatc ggcactctcc 51660 
gcacatggaa cgtcttctgc getgeeggag agcgtcacac gatagacgag ccgcccgcca 51720 
ggcctcaaat ctccgtcgtc tacaggaagg teatcaageg aeggaeggaa caccctgccg 51780 
egateggcat ccttgtaggg gaeggagage actttgtgag cccccagccg ccgccggagc 51840 
ccgacttggc ggcggtgctc gaggaggecg ggagggtgct cgccgacccc ccagtcctac. 51900 
ccgagtggcg caagtttgee ggggaaegge tcgcggacgc actgaceggt aagaegcteg 51960 
cgcccgagat ccccctcccc ggtggcccgc ccgacacggc ggagegaate catcgagatc 52020 
cgcccatcgc ccgttactcg aacggcaccg tgcgcggtgc cgtcgagccg gcggcgcggg 52080 
tggtagcacc gtcgggaacg ctcagcaccc tggagategg agcagggacg ggcgcgacca 52140 
ccgccgccgt cctcccggtg ctgctgcctg aceggaegga gtaccactcc accgangctt 52200 
ccccgctctt ccctgctcgc gcggagcaaa gatttcgaga tcatccattc ctgaagtatg 52260 
gcatcccgga tgtcgaccag gagecagctg gecagggata cgcacatcag aggtttgacg 52320 
tcatcgccgc ggccaatgtc atecatgega cccgcgatat aagagccacg gcgaagcgtc 52380 
tcccgtcgtt gctcgcgccc ggaggectte cggtgctggt cgagggcaca gggcatccga 52440 
tctggttcga tatcaccacg ggactgattg aggggtggca gaagtacgaa gatgatcttc 52500 
gtatcgacca tccgctcctg cccgctcgga cctggtgtga cgtcctgcgc egggtagget 52560 
ttgeggaege cgtgagtctg ccaggcgacg gatctccggc ggggatccnc ggacagcacg 52620 
tgatcctccc gcgcgcgccg ggcacagcag gagcegcttg tgacagctcc ggtgagccgg 52680 
cgaccgaatc gccggccgcg cgcgcagcac ggcaggaatg ggccgacggc xccgccgacg 52740 
tcgcccatcg gatggcgttg gagaggacgt acttccaccg ccggccgggc eggcaggtet 52800 
gggcccacgg tcgactgcgc aceggeggag gcgcgttcac gaaggegetc geeggagate 52860 
tgcccctgct cgaagacacc gggcaggccg eggcagaggt ccaggggccc cgcccgccgc 52920 
agctcgaggc ttctgettte gcgccgcggg acccgcggga agagtggttg tacgetttgg 52980 
aacggcagcg caaagacccc acaccagagg ctccggcagc cgcgtcttct tcctccgcgg 53040 
gggcctggcc cgtgcLgacg gaecagggeg ggacaggege tgcgcccgta tcgccgccgg 53100 
aagggegagg cgaggcgtgc gcgcgcgcca tcgcgggcac ggcacacgcc tgcctcgcgc 53160 
eggggcegta ccaagtcgac ccggcgcagc cagaeggett tcacaccccg ctccgcgacg 53220 
cacccggcga ggaceggatt tgccgcgcgg cagtgeatae gtggagcccx catgegaegg 53280 
cagcagggga gagggegaca gcggagccgc cccaggccga ccaacccctg gggagecega 53340 
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gcgcgctttc tctggtgcag gcgctggtgc gccggaggtg gcgcaacatg ccgcggcttt 53400 
ggctcttgac ccgcgccgtg catgcggtgg gcgcggagga cgcagcggcc tcggtggcgc 53460 
aggcgccggt gtggggcctc ggtcggacgc tcgcgctcga gcatccagag ctgcggtgca 53520 
cgctcgtgga cgtgaacccg gcgccgtctc cagaggacgc agccgcactg gcggtggagc 535B0 
tcggggcgag cgacagagag gaccaggtcg cattgcgctc ggatggccgc tacgtggcgc 53640 
gcctcgtgcg gagctccttt tccggcaagc ctgctacgga ttgcggcatc cgggcggacg 53700 
gcagctatgt gatcaccgat ggcatgggga gagtggggct ctcggtcgcg caacggatgg 53760 
tgatgcaggg ggcccgccat gtggtgctcg tggatcgcgg cggcgcttcc gaggcacccc 53820 
gggatgccct ccggtccatg gccgaggctg gcgcggaggt gcagatcgcg gaggccgacg 53880 
tggctcggcg cgacgatgtc gcccggctcc tctcgaagat cgaaccgtcg . atgccgccgc 53940 
ttcgggggat cgtgtacgtg gacgggacct tccagggcga ctcctcgatg ctggagctgg 54000 
atgcccgtcg cttcaaggag tggatgtatc ccaaggtgct cggagcgtgg aacctgcacg 54060 
cgctgaccag ggatagatcg ctggacttct tcgtcctgta ttcctcgggc acctcgcttc 54120 
tgggcttgcc aggacagggg agccgcgccg ccggtgacgc cttcttggac gccatcgcgc 54180 
atcaccggtg caaggtgggc cttacagcga tgagcatcaa ctggggattg ctctccgaag 54240 
catcatcgcc ggcgaccccg aacgacggcg gagcacggct cgaataccgg gggatggaag 54300 
gcctcacgcc ggagcaggga gcggcggcgc tcgggcgctt gctcgcacga cccagggcgc 54360 
aggtaggggt gatgcggctg aatctgcgcc agtggctgga gttctatccc aacgcggccc 54420 
gattggcgct gtgggcggag ctgctgaagg agcgtgaccg cgccgaccga ggcgcgtcga 54480 
acgcgtcgaa cctgcgcgag gcgctgcaga gcgccaggcc cgaagatcgt cagttgattc 54540 
tggagaagca cttgagcgag ctgttggggc gggggctgcg ccttccgccg gagaggatcg 54600 
agcggcacgt gccgttcagc aatctcggca tggactcgct gataggcctg gagctccgca 54660 
accgcatcga ggccgcgctc ggcatcaccg tgccggcgac cctgctatgg acctacccta 54720 
acgtagcagc tctgagcggg agcttgctag acattctgtt tccgaatgcc ggcgcgaccc 54780 
acgctccggc caccgagcgg gagaagagct tcgagaacga tgccgcagat ctcgaggctc 54840 
tgcggggcat gacggacgag cagaaggacg cgttgctcgc cgaaaagctg gcgcagctcg 54900 
cgcagatcgt tggtgagtaa gggaccgagg gagtatggcg accacgaacg ccgggaagct 54960 
tgagcatgcc cttctgctca tggacaagct tgcgaaaaag aacgcgtctt tggagcaaga 55020 
gcggaccgag ccgatcgcca tcgtaggcat tggccgccgc ttccccggcg gagcggacac 55080 
cccggaggca ccctgggagc tgctcgactc aggccgagac gcggtccagc cgctcgaccg 55140 
gcgctgggcg ctggtcggcg tccatcccag cgaggaggtg ccgcgctggg ccggaccgcc 55200 
caccgaggcg gtggacggct tcgacgccgc gctctttggc acctcgcctc gggaggcgcg 55260 
gtcgctcgat cctcagcaac gcctgctgct ggaggtcacc tgggaagggc tcgaggacgc 55320 
cggcatcgca ccccagtccc tcgacggcag ccgcaccggg gtgttcctgg gcgcatgcag 55380 
cagcgactac tcgcataccg ttgcgcaaca gcggcgcgag gagcaggacg catacgacat 55440 
caccggcaat acgctcagcg tcgccgccgg acggttgtct tatacgctag ggctgcaggg 55500 
accctgcctg accgtcgaca cggcctgctc : gtcgtcgctc gtggccatcc accctgcctg 55560 
ccgcagcctg cgcgctcgcg agagcgatct cgcgctggcg ggaggcgtca acatgctcct 55620 
ctcgtccaag acgatgataa tgctggggcg catccaggcg ctgtcgcccg atggccactg 55680 
ccggacattc gacgcctcgg ccaacgggtt cgtccgtggg gagggctgcg gtatggtcgt 55740 
gctcaaacgg ccctccgacg cccagcgaca cggcgatcgg atctgggctc tgatccgggg 55800 
ttcggccatg aatcaggatg gccggtcgacagggttgatg gcacccaatg cgcccgccca 55860 
ggaggcgctc ttgcgcgagg cgccgcagag cgcccgcgcc gacgccgggg ccatcggtta 55920 
tgtcgagacc cacggaacgg ggacctcgct cggcgacccg atcgaggtcg aggcgctgcg 55980 
tgccgtgttg gggccggcgc gggccgatgg gagccgctgc gtgctgggcg cagtgaagac 56040 
aaacctcggc cacctggagg gcgctgcagg cgtggcgggt ttgatcaagg cggcgctggc 56100 
tctgcaccac gaactgatcc cgcgaaacct ccatccccac acgctcaatc cgcggatccg 56160 
gatcgagggg accgcgcccg cgctggcgac ggagccggcg ccgtggccgc gggcgggccg 56220 
accgcgctcc gcgggggtga gcgcgcccgg ccccagcggc accaacgtcc atgtcgtgct 56280 
ggaggaggcg ccggccacgg tgctcgcacc ggcgacgccg gggcgctcag cggagctttt 56340 
ggtgctgtcg gcgaagagcg ccgccgcgct ggacgcacag gcggcgcggc tctcagcgca 56400 
catcgccgcg tacccggagc agggtctcgg agacgtcgcg ttcagcctgg tatcgacgcg 56460 
tagcccgatg gagcaccggc tcgcggtggc ggcgacctcg cgcgaggcgc tgcgaagcgc 56520 
gctggaggtt gcggcgcagg ggcagacccc ggcaggcgcg gcgcgcggca gggccgcttc 56580 
ctcgcccggc aagctcgcct tcctgttcgc cgggcagggc gcgcaggtgc cgggcatggg .56640 
ccgtgggtcg tgggaggcgt ggccggcgtt ccgcgagacc ttcgaccggt gcgtcacgct 56700 
cttcgaccgg gagctccatc agccgctctg cgaggtgatg tgggccgagc cgggcagcag 56760 
caggtcgtcg ttgctggacc agacggcgtt cacccagccg gcgctctttg cgctggagta 56820 
cgcgctggcc gcgctcttcc ggtcgtgggg cgtggagccg gagctcgccg ctggccatag 56880 
cctcggcgag ctggtggccg cctgcgtggc gggtgtgttc tccctcgagg acgccgtgcg 56940 
cttggtagtc gcgcgcggcc ggttgatgca ggcgctgccg gccggcggcg cgatggtatc 57000 
gatcgccgcg ccggaggccg acgtggctgc cgcggtggcg ccgcacgcag cgttggtgtc 57060 
. gatcgcggca gtcaatgggc cggagcaggt ggtgatcgcg ggcgccgaga aattcgtgca 57120 
gcagatcgcg gcggcgttcg cggcgcgggg ggcgcgaacc aaaccgctgc atgtctcgca 57180 
cgcgttccac tcgccgctca tggatccgat gctggaggcg ttccggcggg tgactgagtc 57240 
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ggtgacgtac cggcggcctt cgatcgcgct ggtgagcaac ctgagcggga agccctgcac 57300 
cgatgaggtg agcgcgccgg gttactgggt gcgtcacgcg cgagaggcgg tgcgcttcgc 57360 
ggacggagtg aaggcgctgc acgcggccgg tgcgggcctc ttcgtcgagg tggggccgaa 57420 
gccgacgctg ctcggccttg tgccggcctg cccgccggat gccaggccgg cgctgctccc 57480 
agcgtcgcgc gccgggcgtg acgaggctgc gagcgcgcta gaggcgctgg gtgggttctg 57540 
ggtcgtcggt ggatcggtca cctggtcggg tgtcttccet tcgggcggac ggcgggtacc 57600 
gctgccaacc tatccctggc agcgcgagcg ttactggatc gaagcgccgg tcgatcgtga 57660 
ggcggacggc accggccgtg ctcgggcggg gggccacccc cttccgggcg aagtcttttc 57720 
cgtgtcgacc catgccggtc tgcgcctgtg ggagacgacg ctggaccgaa agcggctgcc 57780 
gtggctcggc gagcaccggg cgcaggggga ggtcgtgttt cctggcgccg ggcacccgga 57840 
gatggcgctg tcgtcggggg ccgagacctt gggcgatgga ccgatccagg ccacggatgt 57900 
ggtgctcatc gagacgccga ccttcgcggg cgatacggcg gtaccggccc aggtggtgac 57960 
gaccgaggag cgaccgggac ggctgcggtt ccaggtagcg agtcgggagc cgggggaacg 5B020 
tcgcgcgccc ttccggatcc acgcccgcgg cgtgctgcgc cggaccgggc gcgtcgagac 58080 
cccggcgagg tcgaacctcg ccgccctgcg cgcccggcct catgccgccg tgcccgctgc 58140 
ggctatctat ggtgcgctcg ccgagatggg gcttcaatac ggcccggcgt tgcgggggct 58200 
cgccgagctg tggcggggcg agggcgaggc gctgggcagg gtgagaccgc ctgaggccgc 58260 
cggccccgcg acagcctacc agctgcatcc ggtgctgccg gacgcgtgcg tccaaacgac 58320 
tgttggcgcg ctcgccgatc gcgacgaggc gacgccgtgg gcgccggcgg aggcgggccc 58380 
ggcgcggccg ttccagcggt cccccgggga gccatggtgc cacgcgcgcg ccgtgagcga 58440 
tggtcaacag gcccccagcc ggcggagcgc cgactttgag tcgacggacg gcacgggcgc 58500 
ggtggtcgcc gagatctccc ggctggtggt ggagcggcct gcgagcggcg cacgccggcg 58560 
cgacgcagac gactggctcc tggagctgga ttgggagccc gcggcgcccg gtgggcccaa 58620 
gatcacagcc ggccggtggc tgctgctcgg cgagggtggt gggcccgggc gcccgttgtg 58680 
ctcggcgctg aaggccgccg gccacgccgt cgtccacgcc gcgggggacg acacgagcac 58740 
cgcaggaatg cgcgcgctcc tggccaacgc gttcgacggc caggccccga cggccgtggt 58800 
gcacctcagc agcctcgacg ggggcggcca gctcggcccg gggctcgggg cgcagggcgc 58860 
gctcgacgcg ccccggagcc cagatgtcga tgccgatgcc cccgaaccgg cgctgatgcg 58920 
cggtcgcgac agcgcgcccc ccctggcgca agcgctggcc ggcacggacc tccgaaacgc 58980 
gccgcggctg tggcccttga cccgcggggc tcaggcggcc gccgccggcg acgtctccgc 59040 
ggtgcaagcg ccgcrgttgg ggccgggccg caccatcgcc ttggagcacg ccgagctgcg 59100 
ctgcatcagc gtcgacctcg atccagccga gcctgaaggg gaagccgatg ctttgctggc 59160 
cgagctactt gcagacgacg ccgaggagga ggtcgcgctg cgcggtggcg accggctcgt 59220 
tgcgcggctc gtccaccggc xgcccgacgc tcagcgccgg gagaaggtcg agcccgccgg 59280 
tgacaggccg ttccggctag agatcgacga acccggcgcg ctggaccaac tggtgctccg 59340 
agccacgggg cggcgcgctc ccggtccggg cgaggtcgag atccccgccg aagcggcggg 59400 
gctcgactcc accgacatcc agccggcgct gggcgtcgcc cccaacgacc cgcctggaga 59460 
agaaatcgag ccgctggcgc ccggaagcga gtgcgccggg cgcaccgccg ctgtgggcga 59520 
gggcgtgaac ggcctrgcgg tgggccagcc ggtgaccgcc cccgcggccg gagtatttgc 59580 
tacccacgtc accacgccgg ccacgctggt gctgccxcgg cctccggggc cctcggcgac 59640 
cgaggcggcc gcgatgcccc ccgcgtattx gacggcctgg cacgcccncg acaaggccgc 59700 
ccacctgcag gcgggggagc gggtgctgat ccatgcggag gccggtggcg Lcggtctttg 59760 
cgcggtgcga tgggcgcagc gcgtgggcgc cgaggtgtat gcgaccgccg acacgcccga 59820 
gaaccgtgcc tacctggagt cgccgggcgt gcggtacgtg agcgatcccc gcccgggccg 59880 
gttcgccaca gacgtgcatg catggacgga cggcgagggt gcggacgtcg tgctcgaccc 59940 
gctttcgggc gagcgcatcg acaagagcct catggtcccg cgcgcccgcg gtcgccttgt 60000 
gaagctgggc aggcgcgacg actgcgccga cacgcagccc gggccgccgc cgctcctacg 60060 
gaatttttcc ttctcgcagg cggacttgcg gggaatgacg ctcgatcaac cggcgaggat 60120 
ccgtgcgctc ctcgacgagc tgttcgggtt ggtcgcagcc ggtgccatca gcccaccggg 60180 
gtcggggrtg cgcgttggcg gatccctcac gccaccgccg gtcgagacct tcccgatctc 60240 
ccgcgcagcc gaggcattcc ggaggacggc gcaaggacag cacctcggga agctcgtgct 60300 
cacgccggac gacccggagg rgcggatccg cgccccggcc gaacccagcg tcgccgtccg 60360 
cgcggacggc acctaccttg cgaccggcgg tctgggtggc cccggcctgc gcgtggccgg 60420 
atggctggcc gagcggggcg cggggcaact ggtgctggtg ggccgctccg gtgcggcgag 60480 
cgcagagcag cgagccgccg tggcggcgct ggaggcccac ggcgcgcgcg tcacggcggc 60540 
gaaagcggac gtcgccgatc ggtcacagat cgagcgggtc ctccgcgagg ttaccgcgtc 60600 
ggggatgccg ctgcggggcg tcgtgcatgc ggcaggtctc gcggatgacg ggctgctgat 60660 
gcagcagact ccggcgcggt tccgcacggc gatgggaccc aaggnccagg gggccttgca 60720 
cttgcacacg ctgacacgcg aagcgccccc tcccttcttc gcgccgtacg cctccgcagc 60780 
cgggcctccc ggcccgccag gccagggcaa ctatgccgca gccaacgcgt tcctcgacgc 60840 
cctttcgcac caccgaaggg cgcagggcct gccggcgccg agcaccgacc ggggcacgtc 60900 
cacggaggcg gggacggccg ccgcgcaaga aaaccgtggc gcgcggcaga tctctcgcgg 60960 
gacgcggggc accacccccg acgagggtct gtcagctctg gcgcgcccgc tcgagggtga 61020 
tcgcgcgcag acgggggcga caccgatcac tccgcggcag cgggcggagt cctacccggc 61080 
aacagcggcc tcacggaggc cgccgcggcc ggtgaccacg cagcgcgcgg tcgctgatcg 61140 
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gaccgccggg gatcgggacc tgctcgaaca gcttgcgtcg gctgagccga gcgcgcgggc 61200 
ggggctgctg caggacgtcg tgcgcgtgca ggcctcgcat gtgctgcgtc tccctgaaga 61260- 
caagatcgag gtggatgccc cgctctcgag cacgggcacg gactcgctga tgagcctgga 61320 
gctgcgcaac cgcatcgagg ccgcgctggg cgccgccgcg cctgcagcct tggggtggac 61380 
gtacccaacg gtagcagcga taacgcgctg. gctgcccgac gacgcccccg tcgtccggct 61440 
tggcggcggg tcggacacgg acgaatcgac ggcgagcgcc ggtccgttcg tccacgtcct 61500 
ccgctttcgt cctgtcgtca agccgcgggc tcgcctcttc cgttttcacg gttccggcgg 61560 
ctcgcccgag ggcccccgtt cctggtcgga gaagtctgag tggagcgatc tggaaatcgt 61620 
ggccatgtgg cacgatcgca gcctcgcctc cgaggacgcg. cctggtaaga agcacgtcca 61680 
agaggcggcc tcgctgattc agcactatgc agacgcaccg tttgcgttag tagggcccag 61740 
cccgggtgtc cggttcgtca tggggacagc cgtggagctc gccagtcgtt ccggcgcacc 61800 
ggctccgctg gccgtcttca cgttgggcgg cagcttgacc tcttcttcag agatcacccc 61860 
ggagatggag accgatataa tagccaagct cttcttccga aatgccgcgg gtttcgtgcg 61920 
acccacccaa caagtccagg ccgatgctcg cgcagacaag gtcaccacag acaccacggc 61980 
ggctccggcc cccggggacc cgaaggagcc gcccgtgaag accgcggccc ccaccgccgc 62040 
catcgccggc tcggacgacg tgaccgtgcc tccgagcgac gttcaggapc cacaaccticg 62100 
caccacggag cgcttctata tgcatctccc tcccggagat cacgaatctc tcgtcgatcg 62160 
agggcgcgag atcatgcaca tcgtcgaccc gcacctcaat ccgctgctcg ccgcgaggac 62220 
gacgccgtca ggccccgcgt tcgaggcaaa atgatggcag cctccctcgg gcgcgcgaga 62280 
tggtcaggag cagcgcgggc gctggcggcc ggcggcaggc cgcggaggcg cangagccct 62340 

• cccggacgcc tgcagtacag gagactttat gacacaggag caagcgaacc agagcgagac 62400 
gaagcctgct ttcgacttca agccgttcgc gcctgggcac gcggaggacc cgttccccgc .62460 
gaccgagcgc ctgagagagg caacccccat cttctactgg gatgaaggcc gctcccgggt 62520 
ccccacccga taccacgacg tgtcggcggt gctccgcgac gaacgcttcg cggtcagccg 62580 
agaaoagtgg gaaccgagcg cggagtactc gccgg.ccatt cccgagccca gcgatatgaa 62640 
aaagiacgga ttgttcgggc. tgccgccgga ggaitcacgct cgggtccgca agctcgtcaa 62700 
cccgccgcct acgtcacgcg ccaccgaccc gctgcgcgcc gaaatacagc gcaccgccga 62760 
ccagccgccc gatgctcgct ccggacaaga ggagttcgac gttgtgcggg attacgcgga 62820 
aggaatcccg atgcgcgcga tcagcgccct gttgaaggtt ccggccgagc gcgacgagaa 62880 
gctccgtcgc ctcggcccgg cgactgcgcg cgcgctcggc gtgggtctgg cgccccaggt 62940 
cgatgaggag accaagaccc tggtcgcgtc cgtcaccgag gggctcgcgc tgccccatga 63000 
cgtcctcgat gagcggcgca ggaacccgct cgaaaacgac gtcttgacga tgccgcttca 63060 
ggccgaggcc gacggcagca ggctgagcac gaaggagccg gccgcgctcg tgggtgcgat 63120 
catcgccgct ggcaccgata ccacgatcta ccctatcgcg ttcgctgcgc tcaacccgct 63180. 
gcggccgccc gaggcgctcg agctggtgaa ggccgagccc gggctcatga ggaacgcgct 63240 
cgacgaggcg ctccgcttcg acaatatcct cagaatagga actgtgcgtt tcgccaggca 63300 
ggacccggag tactgcgggg catcgaccaa gaaaggggag atggtcttcc tcccgatccc 63360 
aagcgccctg agagatggga ctgtactctc caggccagac gcgcttgatg tgcgacggga 63420 
cacaggcgcg agccccgcgt acggcagagg cccccatgtc tgccccgggg tgccccczgc 63480 
ccgcctcgag gcggagatcg ccgtgggcac caccctccgt aggctccccg agacgaagcz 63540 
aaaagaaacc cccgtgcttg gacaccaccc cgcgttccgg aacatcgaat cacccaacgc 63600 

. catcccgaag ccc tccaaag ct ggatagcc cgcgggggta tcgcttcccg aaccccactc 63660 
cctcatgata cagctcgcgc gcgggtgctg tccgccgcgg gcgcgattcg atccagcgga 63720 
caagcccatt gtcagcgcgc gaagatcgaa tccacggccc ggagaagagc ccgcccgggt 63780 
gacgtcggaa gaagtgccgg gcgccgccct gggagcgcaa agctcgcccg ttcgcgctca 63840 
acacgccgcc cgccatgtcc ggccctgcac ccgcgccgag gagccgcccg ccctgacgca 63900 
cggcctcacc gagcggcagg ttctigctctc gcccgtcgcc ctcgcgctcg cccccctgac 63960 
cgcgcgcgcc ctcggcgagc tcgcgcggcg gccgcgccag cccgaggtgc tcggcgagcc 64020 
ctccggcggc gtggtgccgg gcccgtccgt cgccggcgcg ctcgctcctg ggttccatcg 64080 
agtcctcctc caggatccgg cggtcggggt cgtgctctcc ggcatctcct ggataggcgc 64140 
gcccgtcctg ctgctcatgg cgggtatcga ggtcgatgtg agcatcctgc gcaaggaggc 64200 
gcgccccggg gcgctctcgg cgcccggcgc gaucgcgccc ccgctgcgca cgccggggcc 64260 
gccggtgcag cgcatgcagg gcgcgctcac gtgggatctc gacgtctcgc cgcgacgccc 64320 
tgcgcaagcc tgagcctcgg cgcctgctcg tacaccccgc cggtgctcgc tccgcccgcg 64380 

. gacacccggc cgcccgccgc ggcccagctc gagccggact cgccggatga cgaggccgac 64440 
gaggccgacg aggcgctccg cccgttccgc gacgcgatcg ccgcgtactc ggaggccgct 64500 
cggcgggcgg aggcggcgca -gcggccgcgg ctggagagcc tcgtgcggct cgcgaccgcg .64560 
cggccgggca aggcgcccga caaggtcccr ttcgcgcaca cgacggccgg cgccccccag 64620 
accgccggca gaccccagaa cgatgcggcc tggctcgacg ccgccgcccg gtacgcgagc 64680:. 
-nccgcgcgg cgacggagca cgcgctccgc gacgcggcgt cggccatgga. ggcgcccgcg 64740 
gccggcccgc accgcggatc gagccgcgtg cccgctgccg taggggagtc tcggggggag 64800 
acggcgcgcc ttcaccccgc ggaccgtgta cccgcgcccg accagcagat cccgaccgcg 64860 
ccgcgcgcag ccgagcgggc gctcatcgcg ctctacactg cgttcgcccg tgaggagtga 64920 
acctcccccg ggcgcagccg agcggcggcg tgccggtggc tccctcttcg caaccacgac 64980 
cggagccgcg ctcggtccgc. gcagcggcca gcgcgcgtcg cggcagagat cgctggagcg 65040 
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acaggcgacg acccgcccga gggtgtcgaa cggattgccg cagccctcat tgcggatccc 65100 

ctccagacac tcgttcagct gcttggcgtc gatgccgcct gggcactcgc cgaaggtcag 65160 

ctcgtcgcgc cactcggatc ggatcttgtt cgagcacgcg tccttgcccg aatactcccg 65220 

gtcttgtccg atgttgttgc accgcgcctc gcggtcgcac cgcgccgcca cgatgctatc 65280 

gacggcgctg ccgactggca ccggcgcccc gccctgcgcg ccacccgggg tttgcgcctc 65340 

cccgcctgac cgcttttcgc cgccgcacgc cgcgagcagg ctcattcccg acaccgagat 65400 

caggcccacg accagcttcc cagcaatctt ttgcatggct tcccctccct cacgacacgt 65460 

cacatcagag actccccgcc cggctcgtcg gttcgacagc cggcgacggc cacgagcaga 65520 

accgtccccg accagaacag ccgcatgcgg gcttctcgca acatgccccg acatccccgc 65580 

gactagcgtg cctccgctcg tgccgagatc ggctgtcctg tgcgacggca atatcctgcg 65640 

atcggccggg caggaggtac cgacacgggc . gccgggcggg aggtgccgcc acgggcccga 65700 

aatgtgctgc ggcaggcgcc tccatgcccg cagccgggaa cgcggcgccc ggccagcctc 65760 

ggggtgacgc cgcaaacggg agatgctccc ggagaggcgc cgggcacagc cgagcgccgc 65820 

caccaccgtg cgcactcgtg agctccagct cctcggcata gaagagaccg tcacccccgg 65880 

tccgtgtagg cgatcgtgct gatcagcgcg ctctccgcct gacgcgagtc gagccgggta 65940 

tgctgcacga caatgggaac gtccgattcg atcacgctgg cacagcccgt atcgcgcggg 66000 

atcggctcgg gttcggtcag atcgttgaac cggacgtgcc gggtgcgcct cgct^ggacg 66060 

gtcacccggt acggcccggc ggggtcgcgg tcgctgaagt agacggtgat ggcgacctgc 66120 

gcgtcccggt ccgacgcatt caacaggcag gccgtctcat ggctcgtcat ctgcggctcg 66180 

ggtccgttgc tccggcctgg gatgtagccc cctgcgattg cccagcgcgt ccgcccgatc 66240 

ggcttctcca tatgtcctcc ctgctggctc ctctttggct gcctccctct gctgtccagg 66300 

agcgacggcc tccnctcccg acgcgctcgg ggatccatgg ctgaggatcc tcgccgagcg 66360 

ctccttgccg accggcgcgc cgagcgccga cgggctttga aagcacgcga ccggacacgt 66420 

gatgccggcg cgacgaggcc gccccgcgtc tgatcccgat cgtgacaccg cgacgtccgc 66480 

cggcgcctcc gcaggccggc ctgagcgttg cgcggtcatg gtcgtcctcg cgtcaccgcc 66540 

acccgccgac tcacatccca ccgcggcacg acgcttgctc aaaccgcggc gagacggccg 66600 

ggcggctgtg gtaccggcca gcccggacgc gaggcccgag agggacagtg ggtccgccgt 66660 

gaagcagtga ggcgatcgag gtggcagatg aaacacgctg acacgggccg acgagtcggc 66720 

cgccggacag ggcccacgcc cggtctcctc gcgagcatgg cgctcgccgg ctgtggcggc 66780 

ccgagcgaga aaatcgtgca gggcacgcgg cccgcgcccg gcgccgatgc gcacgtcgcc 66840 

gccgacgccg accccgacgc cgcgaccacg cggctggcgg tggacgtcgt tcacctctcg 66900 

ccgcccgagc gcatcgaggc cggcagcgag cggttcgtcg tctggcagcg tccgagctcc 66960 

gagtccccgt ggcaacgggt cggagtgctc gactacaacg ctgccagccg aagaggcaag 67020 

ccggccgaga cgaccgtgcc gcatgccaac ttcgagctgc tcatcaccgt cgagaagcag 67080 

agcagccctc agtctccatc ctctgccgcc gtcatcgggc cgacgtccgt cgggtaacat 67140 

cgcgctatca gcagcgccga gcccgccagc aggccccaga gccctgcctc gaccgccccc 67200 

tccaccanac cacccccgcg cactcctcca gcgacggccc cgtcgaagca accgccgcgc 67260 

cggcgcggcL ctacgtgcgc gacaggagag cgtcctggcg cggcccgcgc atcgctggaa 67320 

ggatcggcgg agcatggaga aagaaccgag gaccgcgatc tacggcgcca tcgcagccaa 67380 

cgtggcgatc gcggcggtca agttcatcgc cgccgccgtg accggcagct cggcgatgct 67440 

ctccgagggc gtgcaccccc tcgtcgatac tgcagacggg ctcctcctcc tgctcggcaa 67500 

gcaccggagc gcacgcccgc ccgacgccga gcacccgttc ggccacggca aggagctcta 67560 

tttctggacg ctgatcgtcg ccatcatgat ctccgccgcg ggcggcggcg tctcgatcta 67620 

egaagggatc tcgcacctcc tgcacccgcg ccagatcgag gacccgacgt ggaactacgt 67680 

cgccctcggc gcagcggccg tcttcgaggg gacgtcgctc atcacctcga tccacgagtt 67740 

caagaagaag gacggacagg gctacctcgc ggcgatgcgg tccagcaagg acccgacgac 67800 

gttcacgacc gtcctggagg actccgcggc gctcgccggg ctcaccatcg ccttcctcgg 67860 

cgtctggctc gggcaccgcc tgggaaaccc ctacctcgac ggcgcggcgt cgaccggcat 67920 

cggcctcgcg ctcgccgcgg tcgcggtctt cctcgccagc cagagccgtg ggctcctcgt 67980 

gggggagagc gcggacaggg agctcctcgc cgcgatccgc gcgctcgcca gcgcagatcc 68040 

tggcgtgtcg gcggtggggc ggcccctgac gatgcacttc ggtccgcacg aagtcctggt 68100 

cgtgctgcgc atcgagttcg acgccgcgct cacggcgtcc ggggtcgcgg aggcgatcga 68160 

gcgcatcgag acccggatac ggagcgagcg acccgacgtg aagcacatct acgtcgaggc 68220 

caggtcgctc caccagcgcg cgagggcgtg acgcgccgcg gagagaccgc gcgcggcctc 68280 

cgccatcctc cgcggcgccc gggctcaggt ggccctcgca gcagggcgcg cctggcgggc 68340 

aaaccgtgca gacgtcgtcc ttcgacgcga ggcacgctgg ttgcaagtcg ccacgccgta 68400 

tcgcgaggtc cggcagcgcc ggagcccggg cgggccgggc gcacgaaggc gcggcgagcg 68460 

caggcctcga ggggggcgac gtcatgagga aggccagggc gcatggggcg atgctcggcg 68520 

ggcgagacga cggccggcgc cgcggccccc ccggcgccgg cgcgcttcgc gccgcgcccc 68580 

agcgcggtcg cccgcgcgat cccgcccggc gccggctcat cgcctccgcg tccctcgccg 68640 

gcggcgccag catggcggcc gcctcgccgc tccagctcgg^ gatcatcgag cgcccgcccg 68700 

accctccgct rccagggctc gattcggcca aggtgacgag ctccgatatc 68750 

<210> 2 
<211> 1421 
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<212> PR? 

<213> Sorangium ceilulosum 
<400> 2 

Val Ala Asp Arg Pro He Glu Arg Ala Ala Glu Asp Pro He Ala He 
1 5 10 15 

Val Gly Ala Ser Cys Arg Leu Pro Gly Gly Val He Asp Leu Ser Gly 
20 25 30 

Phe Trp Thr Leu Leu Glu Gly Ser Arg Asp Thr Val Gly Arg Val Pro 
35 40 45 

Ala Glu Arg Trp Asp Ala Ala Ala Trp Phe Asp Pro Asp Pro Asp Ala 
50 55 SO 

Pro Gly Lys Thr Pro Val Thr Arg Ala Ser Phe Leu Ser Asp Val Ala 
65 70 75 80 

Cys Phe Asp Ala Ser Phe Phe Gly lie Ser Pro Arg Glu Ala Leu Arg 
85 90 95 

Met Asp Pro Ala His Arg Leu Leu Leu Glu Val Cys Trp Glu Ala Leu 
100. 105 110 

Glu Asn Ala Ala lie Ala Pro Ser Ala Leu Val Gly Thr Glu Thr Gly 
115 120 . 125 

Val Phe He Gly He Gly Pro Ser Glu Tyr Glu Ala Ala Leu Pro Gin 
130 135 140 

Ala Thr Ala Ser Ala Glu He Asp Ala His Gly Gly Leu Gly Thr Met 
145 150 155 160 

Pro Ser Val Gly Ala Gly Arg He Ser Tyr Ala Leu Gly Leu Arg Gly 
165 170 175 

Pro Cys Val Ala Val Asp Thr Ala Tyr Ser Ser Ser Leu Val Ala Val 
180 . 185 190 

His Leu Ala Cys Gin Ser Leu Arg Ser Gly Glu Cys Ser Thr Ala Leu 
195 200 205 

Ala Gly Gly Val Ser Leu Met Leu Ser Pro Ser Thr Leu Val Trp Leu 
210 215 ' 220 

Ser Lys Thr Arg Ala Leu Ala Arg Asp Gly Arg Cys Lys Ala Phe Ser 
225 230 235 240 

Ala Glu Ala Asp Gly Phe Gly Arg Gly Glu Gly Cys Ala Val Val Val 
245 250 255 

Leu Lys Arg Leu Ser Gly Ala Arg Ala Asp Gly Asp Arg He Leu Ala 
260 265 .270 

Val He Arg Gly Ser Ala lie Asn His Asp Gly Ala Ser Ser Gly Leu 
275 280 285 

Thr Val Pro Asn Gly Ser Ser Gin Glu He Val Leu Lys Arg Ala Leu 
290 295 300 

Ala Asp Ala . Gly Cys Ala Ala Ser Ser Val Gly Tyr Val Glu Ala His 
305 310 315 320 

Gly Thr Gly Thr Thr Leu Gly Asp Pro . lie Glu He Gin Ala Leu Asn 
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325 330* 335 

Ala Val Tyr Gly Leu Gly Arg Asp Val Ala Thr Pro Leu Leu lie Glv 
. 340 345 350 

Ser Val Lys Thr Asn Leu Gly His Pro Glu Tyr Ala Ser Gly He Thr 
355 360 365 

Gly Leu Leu Lys Val Val Leu Ser Leu Gin His Gly Gin He Pro Ala 
370 375 ; 380 

His Leu His Ala Gin Ala Leu Asn Pro Arg lie Ser Trp Gly Asp Leu 
385 390 395 400 

Arg Leu Thr Val Thr Arg Ala' Arg Thr Pro Trp Pro Asp Trp Asn Thr 
405 410 415 

Pro Arg Arg Ala Gly Val Ser Ser Phe Gly Met Ser Gly Thr Asn Ala 
420 425 430 

His Val Val Leu Glu Glu Ala Pro Ala Ala Thr Cys Thr Pro Pro Ala 
435 440 445 

Pro Glu Arg Pro Ala Glu Leu Leu Val Leu Ser Ala Arg Thr Ala Ser 
450 455 460 

Ala Leu Asp Ala Gin Ala Ala Arg Leu Arg Asp His Leu Glu Thr Tyr 
■ 4o5 470 475 480 

Pro Ser Gin Cys Leu Gly Asp Val Ala Phe Ser Leu Ala Thr Thr Arg 
485 490 495 

Ser Ala Met Glu His Arg Leu Ala Val Ala Ala Thr Ser Arg Glu Glv 
■ 500 505 510 

Leu Arg Ala Ala Leu Asp Ala Ala Ala Gin Gly Gin Thr Ser Pro Gly 
515 520 525 

Ala Val Arg Ser lie Ala Asp Ser Ser Arg Gly Lys Leu Ala Phe Leu 
5j0 535 540 

Phe Thr Gly Gin Gly Ala Gin Thr Leu Gly Met Gly Arg Gly Leu Tyr 
545 550 555 560 

Asp Val Trp Ser Ala Phe Arg Glu Ala Phe Asp Leu Cys Val Arg Leu 
565 570 575 

Phe Asn Gin Glu Leu Asp Arg Pro Leu Arg Glu Val Met Trp Ala Glu 
. 580 585 590 

Pro Ala Ser Val Asp Ala Ala Leu Leu Asp Gin Thr Ala Phe Thr Gin 
555 600 605 

Pro Ala Leu Phe Thr Phe Glu Tyr Ala Leu Ala Ala Leu Trp Arg Ser 
610 615 620 

Trp Gly Val Glu Pro Glu Leu Val Ala Gly His Ser lie Gly Glu Leu 
625 630 635 640 

Val Ala Ala Cys Val Ala Gly Val Phe Ser Leu Glu Asp Ala Val Phe 
645 650 655 

Leu Val Ala Ala Arg Gly Arg Leu Met Gin Ala Leu Pro Ala Gly Glv 
660 665 670 
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Ala Met Val Ser He Glu Ala Pro Glu Ala -Asp Val Ala Ala Ala Val 
675 680 685 

Ala Pro His Ala Ala Ser Val Ser He Ala Ala Val Asn Ala Pro Asp 
690 695 '700 

Gin Val Val lie Ala Gly Ala Gly Gin Pro Val His Ala lie Ala Ala 
705 710 715 720 

Ala Met Ala Ala Arg Gly Ala Arg Thr Lys Ala Levi His Val Ser His 
725 730 735 

Ala Phe His Ser Pro Leu Met Ala Pro Met Leu Glu Ala Phe Gly Arg 
740 745 750 

Val Ala Glu Ser Val Ser Tyr Arg Arg Pro Ser He Val Leu Val Ser 
755 760 765 

Asn Leu Ser Gly Lys. Ala Cys Thr Asp Glu Val Ser Ser Pro Gly Tyr 
770 775 780 

Trp Val Arg His Ala Arg Glu Val Val Arg Phe Ala Asp Gly Val Lys . 
785 790 795 800 

Ala Leu His Ala Ala Gly Ala Gly Thr Phe Val Glu Val Gly Pro Lys 
805 810 815 

Ser Thr Leu Leu Gly Leu Val Pro Ala Cys Met Pro Asp Ala Arg Pro 
820 825 .830 

Ala Leu Leu Ala Ser Ser Arg Ala Gly Arg Asp Glu Pro Ala Thr Val 
835 840 , .845 

Leu Glu Ala Leu Gly Gly Leu Trp Ala Val Gly Gly Leu Val Ser Trp 
850 855 860 . 

Ala Gly Leu Phe Pro Ser Gly Gly Arg Arg Val Pro Leu Pro Thr Tyr 
865 870 875 880 

Pro Trp Gin Arg Glu Arg Tyr Trp He Asp Thr Lys . Ala Asp Asp Ala 
885 890 895 

Ala Arg Gly Asp Arg Arg Ala Pro Gly Ala Gly His . Asp Glu Val Glu 
900 905 " 910 

Glu Gly Gly Ala Val Arg Gly Gly Asp Arg Arg Ser Ala Arg Leu Asp 
915 920 925 

His* Pro Pro Pro Glu Ser. Gly Arg Arg Glu Lys Val Glu Ala Ala Gly 
930 935 940 

Asp Arg Pro Phe Arg Leu Glu lie Asp Glu Pro Gly. Val . Leu Asp His 
945 950 955 960 

Leu Val . Leu Arg Val Thr Glu Arg Arg Ala Pro Gly Leu Gly Glu Val ." 

965 . 970 975 

Glu He Ala Val Asp Ala Ala Gly Leu Ser Phe Asn Asp . Val Gin Leu 
980 985 990 

Ala Leu Gly Met Val Pro Asp Asp Leu Pro Gly Lys Pro Asn Pro Pro 
995 1000 • 1Q05 

Leu Leu Leu Gly Gly Glu Cys Ala Gly Arg He Val Ala Val Gly Glu 
1010 1015 1020 
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Gly Val Asn Gly Leu Val Val Gly Gin Pro Val lie Ala Leu Ser Ala 
1025 . 1030 1035 1040 

Gly Ala Phe Ala Thr His Val Thr Thr Ser Ala Ala Leu Val Leu P-o 
1045 1050 1055 

Arg Pro Gin Ala Leu Ser Ala He Glu Ala Ala Ala Met Pro Val Ala 
1060 1065 1070 

Tyr Leu Thr Ala Trp Tyr Ala Leu Asp Arg lie Ala Arg Leu Gin Pro 
1075 1080 1085 

Gly Glu Arg Val Leu He His Ala Ala Thr Gly Gly Val Gly Leu Ala 
1090 1095 lioo 

Ala val Gin Trp Ala Gin His Val Gly Ala Glu Val His Ala Thr Ala 
1105 mo ins U20 

Gly Thr Pro Glu Lys Arg Ala Tyr Leu Glu Ser Leu Gly Val Arg Tyr 
1125 H30 H35 

Val Ser Asp Ser Arg Ser Asp Arg Phe Val Ala Asp Val Arg Ala Trp 
■ - H40 1145 1150 

Thr Gly Gly Glu Gly Val Asp Val Val Leu Asn Ser Leu Ser Gly Giu 
1155 H60 H65 

Leu lie Asp Lys Ser Phe Asn Leu Leu Arg Ser His Gly Arg Phe Val 
1170 1175 1180 

Glu Leu Gly Lys Arg Asp Cys Tyr Ala Asp Asn Gin Leu Gly Leu Arg 
11 Q5 H90 H95 1200 

Pro Phe Leu Arg Asn Leu Ser Phe Ser Leu Val Asp Leu Arg Gly Met 
1205 1210 1215 

Met Leu Glu Arg Pro Ala Arg Val Arg Ala Leu Leu Glu Glu Leu Leu 
1220 1225 1230 

Gly Leu He Ala Ala Gly Val Phe Thr Pro Pro Pro He Ala Thr Leu 
1235 1240 1245 

Pro He Ala Arg Val Ala Asp Ala Phe Arg Ser Met Ala Gin Ala Gin 
1250 1255 1260 

His Leu Gly Lys Leu Val Leu Thr Leu Gly Asp Pro Glu Val Gin He 
1265 1270 1275 1280 

Arg He Pro Thr His Ala Gly Ala Gly Pro Ser Thr Gly Asp Arg Asp 
1285 1290 1295 

Leu Leu Asp Arg Leu Ala Ser Ala Ala Pro Ala Ala Arg Ala Ala Ala 
1300 1305 1310 

Leu Glu Ala Phe Leu Arg Thr Gin Val Ser Gin Val Leu Arg Thr Pro 
1315 1320 1325 

Glu He Lys Val Gly Ala Glu Ala Leu Phe Thr Arg Leu Gly Met Asp 
1330 1335 1340 

Ser Leu Met Ala Val Glu Leu Arg Asn Arg He Glu Ala Ser Leu Lys 
1345 1350 1355.. 1360 

Leu Lys Leu Ser Thr Thr Phe Leu Ser Thr Ser Pro Asn He Ala Leu 
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1365 1370- 1375 

Leu Ala Gin Asn Leu Leu Asp Ala Leu Ala Thr Ala Leu Ser Leu Glu 
• 1380 1385 ... 1390 

Arg Val Ala Ala Glu Asn Leu Arg Ala Gly Val Gin Asn Asp Phe Val 
1395 1400 1405 

Ser Ser Gly Ala Asp Gin Asp Trp Glu lie He Ala Leu 
1410 1415 ■ 1420 



<210> 3 
<211> 1410 
<212> PRT 

<213> Sorangium cellulosum 
<400> 3 

Met Thr He Asn Gin Leu Leu Asn Glu Leu Glu His Gin Gly He Lys 
1 5 10 15 . 

Leu Ala Ala Asp Gly Glu Arg Leu Gin He Gin Ala Pro Lys Asn Ala 
20 25 .30 

Leu Asn Pro Asn Leu Leu Ala Arg lie Ser Glu His Lys Ser Thr lie 
35 40 45 

Leu Thr Met Leu Arg Gin Arg Leu Pro Ala. Glu Ser lie Val Pro Ala 
50 55 60 

Pro Ala Glu Ara His Ala Pro Phe Pro Leu Thr Asp lie Gin Glu Ser 
65. . 70 75 80 

Tyr Trp Leu Gly Arg Thr Gly Ala Phe Thr Val Pro Ser Gly lie His 
85 90 95 

Ala Tyr Arg Glu Tyr Asp Cys Thr Asp Leu Asp Val Pro Arg Leu Ser 
100 105 HO 

Arg Ala Phe Arg Lys Val Val Ala Arg His Asp Met Leu Arg Ala His 
115 120 125 

Thr Leu Pro Asp Met Met Gin Val lie Glu Pro Lys Val Asp Ala Asp 
130 135 140 

He Glu He lie Asp Leu Arg Gly Leu Asp Arg Ser. Thr Arg Glu Ala 
145 150 155 160 

Arg Leu Val Ser Leu Arg Asp Ala Met Ser His . Arg He Tyr Asp Thr 
165 170 175 

Glu Arg Pro Pro Leu Tyr His Val Val Ala Val Arg Leu Asp Glu Arg 
.180 185 190 • 

Gin Thr Arg Leu Val Leu Ser lie Asp Leu He Asn Val Asp Leu. Gly 
195 200 205 

Ser Leu Ser He He Phe Lys Asp. Trp Leu Ser Phe Tyr Glu Asp Pro 
210 215 220 

Glu Thr Ser Leu Pro Val Leu Glu Leu Ser Tyr Arg Asp Tyr Val Leu 
225 230; 235 . 240 

Ala Leu Glu Ser Arg Lys Lys Ser Glu Ala His Gin Arg Ser Met Asp 
245 250 255 
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• Tyr Trp Lys Arg Arg lie Ala Glu Leu Pro Pro Pro Pro Thr Leu Pro 
260 265 270 

Met Lys Ala Asp Pro Ser thr Leu Lys Glu He Arg Phe Arg His Thr 
275 280 285 

Glu Gin Trp Leu Pro Ser Asp Ser Trp Gly Arg Leu Lys Arg Arg Val 
290 295 300 

Gly Glu Arg Gly Leu Thr Pro Thr Gly Val lie Leu Ala Ala Phe Ser 
305 310 3i5 320 

Glu Val lie Gly Arg Trp Ser Ala Ser Pro Arg Phe Thr Leu Asn lie 
325 330 335 

Thr Leu Phe Asn Arg Leu Pro Val His Pro Arg Val Asn Asp He Thr 
340 345 350 

Gly Asp Phe Thr Ser Met Val Leu Leu Asp lie Asp Thr Thr Arg Asd 
355 360 365 

Lys Ser Phe Glu Gin Arg Ala Lys Arg He Gin Glu Gin Leu Tro Glu 
370 375 380 

Ala Met Asp His Cys Asp Val Ser Gly He Glu Val Gin Ara Glu Ala 
- 85 390 395 400 

Ala Arg Val Leu Gly He Gin Arg Gly Ala Leu Phe Pro Val Val Leu 
405 410 415 

Thr Ser Ala Leu Asn Gin Gin Val Val Gly Val Thr Ser Leu Gin Arg 
420 425 430 

Leu Gly Thr Pro Val Tyr Thr Ser Thr Gin Thr Pro Gin Leu Leu Leu 
435 440 445 

Asp His Gin Leu Tyr Glu His Asp Gly Asd Leu Val Leu Ala Trp Asp 
450 455 460 

He Val Asp Gly Val Phe Pro Pro Asp Leu Leu Asp Asp Met Leu Glu 
465 470 475 480 

Ala Tyr Val Val Phe Leu Arg Arg Leu Thr Glu Glu Pro Trp Gly Glu 
485 490 495 

Gin Val Arg Cys Ser Leu Pro Pro Ala Gin Leu Glu Ala Arg Ala Ser 
500 505 510 

Ala Asn Ala Thr Asn Ala Leu Leu Ser Glu His Thr Leu H<s Gly Leu 
515 520 525 

Phe Ala Ala Arg Val Glu Gin Leu Pro Met Gin Leu Ala Vai Val Ser 
530 535 540 

Ala Arg Lys Thr Leu Thr Tyr Glu Glu Leu Ser Arg Arg Ser Arg Arg 
545 550 555 560 

Leu Gly Ala Arg Leu Arg Glu Gin Gly Ala Arg Pro Ash Thr Leu Val 
565 570 575 

Ala Val Val Met Glu Lys Gly Trp Glu Gin Val Val Ala Val Leu Ala 
580 585 590 

Val Leu Glu Ser Gly Ala Ala Tyr Val Pro He Asp Ala Asp Leu Pro 
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595 600 605 

Ala Glu Arg lie His Tyr Leii Leu Asp His Gly Glu Val Lys Leu Val 
610 615 620 

Leu Thr Gin Pro Trp Leu Asp Gly Lys Leu Ser. Trp Pro Pro Gly He 
625 630 635 640 

Gin Arg Leu Leu Val Ser Glu Ala Gly Val Glu Gly Asp Gly Asp Gin 
645 650 655 

Pro Pro Met Met Pro lie Gin Thr Pro Ser Asp Leu Ala Tyr Val He 
660 665 670 

Tyr Thr Ser Gly Ser. Thr Gly Leu Pro Lys Gly Val Met lie Asp His 
,675 680 685 

Arg Gly Ala Val Asn Thr He Leu Asp He Asn Glu Arg Phe Glu He 
690 695 700. 

Gly Pro Gly Asp Arg Val Leu Ala Leu Ser Ser Leu Ser Phe Asp Leu 
705 710 • 715 720 

Ser Val Tyr Asp Val Phe Gly He Leu Ala Ala Gly Gly Thr He Val 
725 730 735 

Val Pro Asp Ala Ser Lys Leu Arg Asp Pro Ala His Trp Ala Glu Leu 
740. 745 . 750 

lie Glu Arg Glu Lys Val Thr Val Trp Asn Ser Val Pro Ala Leu Met 
755 760 765 

Arg Met Levi Val Glu His Phe Glu Gly Arg Pro Asp Ser Leu Ala Arg 
770 .775 780 

Ser Leu Arg Leu Ser Leu Leu Ser Gly Asp Trp He Pro Val Gly Leu 
785 790 795 800 

Pro Gly Glu Leu Gin Ala lie Arg Pro Gly Val Ser Val lie Ser Leu 
805 .810 815 

Gly Gly Ala Thr Glu Ala Ser He Trp Ser lie Gly Tyr Pro Val Arg 
820 825 830 

Asn Val Asp Leu Ser Trp Ala Ser ; lie Pro Tyr Gly Arg Pro Leu Arg 
, 835 840 845 

Asn Gin Thr Phe His Val Leu Asp Glu Ala Leu Glu Pro Arg Pro Val 
850 855 860 

Trp Val Pro Gly Gin Leu Tyr lie Gly Gly Val Gly Leu Ala Leu Gly 
865 870 ; 875 880 

Tyr Trp Arg Asp Glu Glu Lys Thr Arg Lys Ser Phe Leu Val His Pro 
885 - 890 . 895 

Glu Thr Gly Glu Arg Leu Tyr Lys Thr Gly Asp Leu Gly Arg Tyr Leu 
900 .905 910 

Pro Asp Gly Asn He Glu Phe Met Gly Arg" Glu Asp Asn Gin He Lys 
915 920 . 925 

Leu Arg Gly Tyr Arg Val Glu Leu Gly Glu He Glu . Glu Thr Leu Lys 
930 935 ... 940 
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Ser His Pro Asn Val Arg Asp Ala Val lie Val Pro Val Gly Asn Asp 
945 950 955 960 

Ala Ala Asn Lys Leu Leu Leu Ala Tyr VaJ. Val Pro Glu Gly Thr Arg 
965 970 975 

Arg Arg Ala Ala Glu Gin Asp Ala Ser Leu Lys Thr Glu Arg lie Asp 
980 985 990 

Ala Arg Ala His Ala Ala Glu Ala Asp Gly Leu Ser Asp Gly Glu Arg 
995 1000 1005 

. Val Gin Phe Lys Leu Ala Arg His Gly Leu Arg Arg Asp Leu Asp Gly 
1010 1015 1020 

Lys Pro Val Val Asp Leu Thr Gly Gin Asp Pro Arg Glu Ala Gly Leu 
1025 1030 1035 1040 

Asp Val Tyr Ala Arg Arg Arg Ser Val Arg Thr Phe Leu Glu Ala Pro 
1045 1050 1055 

lie Pro Phe Val Glu Phe Gly Arg Phe Leu Ser Cys Leu Ser Ser Val 
1060 1065 " 1070 

Glu Pro Asp Gly Ala Thr Leu Pro Lys Phe Arg Tyr Pro Ser Ala Gly 
1075 1080 1085 

Ser Thr Tyr Pro Val Gin Thr Tyr Ala Tyr Val Lys Ser Gly Arg He 
1090 1095 1100 

Glu Gly Val Asp Glu Gly Phe Tyr Tyr Tyr His Pro Phe Glu His Arg 
1105 Uio H15 1120 

Leu Leu Lys Leu Ser Asp His Gly He Glu Arg Gly Ala His Val Arg 
1125 1130 1135 

Gin Asn Phe Asp Val Phe Asp Glu Ala Ala Phe Asn Leu Leu Phe Val 
1140 1145 1150 

Gly Arg lie Asp Ala He Glu Ser Leu Tyr Gly Ser Ser Ser Arg Glu 
1155 1160 1165 

Phe Cys Leu Leu Glu Ala Gly Tyr Met Ala Gin Leu Leu Mec Glu Gin 
1170 1175 1180 

Ala Pro Ser Cys Asn lie Gly Val Cys Pro Val Gly Gin Phe Asn Phe 
1185 1190 1195 1200 

Glu Gin Val Arg Pro Val Leu Asp Leu Arg His Ser Asp Val Tyr Val 
■ 1205 1210 1215 

His Gly Met Leu Gly Gly Arg Val Asp Pro Arg Gin Phe Gin Val Cys 
1220 1225 1230 

Thr Leu Gly Gin Asp Ser Ser Pro Arg Arg Ala Thr Thr Arg Gly Ala 
1235 1240 1245 

Pro Pro Gly Arg Glu Gin His Phe Ala Asp Met Leu Arg Asp Phe Leu 
1250 1255 1260 

Arg Thr Lys Leu Pro Glu Tyr Met Val Pro Thr Val Phe Val Glu Leu 
1265 1270 1275 1280 

Asp Ala Leu Pro Leu Thr Ser Asn Gly Lys Val Asp Arg Lys Ala Leu 
1285 1290 1295 
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Arg Glu Arg Lys Asp Thr Ser Ser Pro Arg His Ser Gly His Thr Ala 
1300 1305 1310 

Pro Arg Asp Ala Leu Glu Glu lie Leu Val Ala Val Val Arg Glu Val 
.1315 1320 1325 

Leu Gly Leu Glu Val Val Gly Leu Gin Gin Ser Phe Val Asp Leu Gly 
1330 1335 1340 

Ala Thr Ser He His lie Val Arg Met Arg Ser Leu Leu Gin Lys Arg 
1345 1350 1355 1360 

Leu Asp Arg Glu He Ala lie Thr Glu Leu Phe Gin Tyr Pro Asn Leu. 

1365 1370 1375 

Gly Ser Leu Ala Ser Gly Leu Arg Arg Asp Ser Arg Asp Leu Asp Gin 
1380 1385/ 1390 

Arg Pro Asn Met Gin. Asp Arg Val Glu Val Arg Arg Lys Gly Arg Arg 
1395 . 1400 1405 

Arg Ser 
1410 



<210> 4 
<211> 1832 
* <212> PRT 
<213> Sorangium cellulosum 

<400> 4 

Met Glu Glu Gin Glu Ser Ser Ala lie Ala Val He Gly Met. Ser Gly 
1 5 ■ 10 15. 

Arg Phe Pro Gly Ala Arg Asp Leu Asp Glu Phe Trp. Arg Asn Leu Arg. 
20 25 30 

Asp Gly Thr Glu Ala Val Gin Arg Phe Ser Glu Gin Glu Leu Ala Ala 

35 " . 40. . . 45 

Ser Gly Val Asp Pro Ala Leu Val Leu Asp Pro Ser Tyr Val Arg Ala 
50 55 60 

Gly Ser Val Leu Glu Asp Val Asp Arg Phe Asp Ala Ala Phe Phe Gly .. 
65 70 75 80 

He Ser Pro Arg Glu Ala Glu Leu Met Asp Pro Gin His Arg He Phe 
85 90 95 

Met Glu Cys Ala Trp Glu Ala Leu Glu . Asn Ala Gly Tyr Asp Pro Thr . 
100 105 110 

Ala Tyr Glu Gly Ser lie Gly Val Tyr Ala Gly Ala Asn Met Ser Ser 

115 .120 . 125 . 

Tyr Leu Thr Ser Ash Leu His Glu His Pro Ala Met Met Arg Trp Pro 
130 . 135 140 

Gly Trp Phe Gin Thr Leu He Gly Asn Asp Lys Asp Tyr Leu Ala Thr 
145 150 155 . 160 

His Val . Ser Tyr Arg Leu Ash Leu Arg Gly Pro Ser He Ser Val Gin 
165 170 175 
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, Thr Ala Cys Ser Thr Ser Leu Val Ala Val His Leu Ala Cys Met Ser 
180 185 190 

Leu Leu Asp Arg Glu Cys Asp Met Ala Leu Ala Gly Gly lie Thr Val 
195 200 205 

Arg He Pro His Arg Ala Gly Tyr Val Tyr Ala Glu Gly Gly lie Phe 
210 215 220 

Ser Pro Asp Gly His Cys Arg Ala Phe Asp Ala Lys Ala Asn Gly Thr 
225 230 235 240 

He Met Gly Asn Gly Cys Gly Val Val Leu Leu Lys Pro Leu Asp Arg 
245 250 255 

Ala Leu Ser Asp Gly Asp Pro Val Arg Ala Val He Leu Gly Ser Ala 
260 265 270 

Thr Asn Asn Asp Gly. Ala Arg Lys He Gly Phe Thr Ala Pro Ser Glu 
275 280 285 

Val Gly Gin Ala Gin Ala He Met Glu Ala Leu Ala Leu Ala Gly Val 
290 295 300 

Glu Ala Arg Ser lie Gin Tyr He Glu Thr His Gly Thr Gly Thr Leu 
305 310 315 320 

Leu Gly Asp Ala He Glu Thr Ala Ala Leu Arg Arg Val Phe Gly Arg 
325 330 335 

Asp Ala Ser Ala Arg Arg Ser Cys Ala lie Gly Ser Val Lys Thr Gly 
340 345 350 

He Gly His Leu Glu Ser Ala Ala Gly lie Ala Gly Leu He Lys Thr 
355 360 365 

Val Leu Ala Leu Glu His Arg Gin Leu Pro Pro Ser Leu Asn Phe Glu 
370 375 380 

Ser Pro Asn Pro Ser He Asp Phe Ala Ser Ser Pro Phe Tyr Val Asn 
385 390 395 400 

Thr Ser Leu Lys Asp Trp Asn Thr Gly Ser Thr Pro Arg Arg Ala Gly 
405 410 415 

Val Ser Ser Phe Gly He Gly Gly Thr Asn Ala His Val Val Leu Glu 
420 425 430 

Glu Ala Pro Ala Ala Lys Leu Pro Ala Ala Ala Pro Ala Arg Ser Ala 
435 440 445 

Glu Leu Phe Val Val Ser Ala Lys Ser Ala Ala Ala Leu Asp Ala Ala 
450 455 460 

Ala Ala Arg Leu Arg Asp His Leu Gin Ala His Gin Gly He Ser Leu 
465 470 475 480 

Gly Asp Val Ala Phe Ser Leu Ala Thr Thr Arg Ser Pro Met Glu His 
485 490 495 

Arg Leu Ala Met Ala Ala Pro Ser Arg Glu Ala Leu Arg Glu Gly Leu 
500 505 510 

Asp Ala Ala Ala Arg Gly Gin Thr Pro Pro Gly Ala Val Arg Gly Arg 



WO 99/66028 



PCT/EP99/04171 



-29- 



Cys Ser Pro Gly Asn Val Pro Lys Val Val Phe Val Phe Pro Gly Gin 

.53.0 / . 535 540 

Gly Ser Gin Tip Val Gly Met Gly Arg Gin Leu Leu Ala Glu Glu Pro 
545 550 555 560 

Val Phe His Ala Ala Leu Ser Ala Cys Asp . Arg Ala He Gin Ala Glu 
• 565 570 575 

Ala Gly Trp Ser Leu Leu Ala Glu Leu Ala Ala Asp Glu Gly Ser Ser 
580 585 590 

Gin Leu Glu Arg He Asp Val Val Gin Pro Val Leu Phe Ala Leu Ala 
595 600 605 

Val Ala Phe Ala Ala Leu Trp Arg Ser Trp Gly Val Ala Pro Asp Val 
610. . 615 620 

Val He Gly His Ser Met Gly Glu Val Ala Ala Ala His Val Ala Gly 
525 630 635 640 

Ala Leu Ser Leu Glu Asp Ala Val Ala lie lie Cys Arg Arg Ser Arg 
645 . 650 655 

Leu Leu Arg Arg lie Ser Gly Gin Gly Glu Met Ala Val Thr Glu Leu 
660 665 670 . 

Ser Leu Ala Glu Ala Glu Ala Ala Leu Arg Gly Tyr Glu Asp Arg Val 
675 680 685 

Ser Val Ala Val Ser Asn. Ser Pro Arg Ser Thr Val Leu Ser Gly Glu 
690 695 700 

Pro Ala Ala He Gly Glu Val Leu Ser Ser Leu Asn Ala Lys Gly Val 
705 710 715 720 

Phe Cys Arg Arg Val Lys Val Asp Val Ala Ser His Ser Pro Gin Val 
725 . 730 735 

Asp Pro Leu Arg Glu Asp Leu Leu Ala Ala Leu Gly Gly Leu Arg Pro 
740 . 745 . 750 

Gly . Ala Ala Ala Val Pro Met Arg Ser Thr Val Thr Gly Ala Met Val 
755 760 .765 

Ala Gly Pro Glu Leu Gly Ala Asn Tyr Trp Met Asn Asn Leu Arg Gin 
770 775 780 

Pro Val Arg Phe Ala Glu Val Val Gin Ala Gin Leu Gin Gly Gly His 
785 790 795 800 

Gly Leu Phe Val Glu Met Ser Pro His Pro lie Leu Thr. Thr Ser Val 
805 .810 815 

Glu Glu Met Arg Arg Ala Ala Gin ; Arg Ala Gly Ala Ala Val Gly Ser 
820 825 830 

Leu Arg Arg Gly Gin Asp Glu Arg Pro Ala Met Leu Glu Ala Leu Gly 
835 840 .845 

Thr Leu Trp Ala Gin Gly Tyr Pro Val Pro Trp Gly Arg Leu Phe Pro 
850 855 860. 

Ala Gly Gly Arg Arg Val Pro Leu Pro Thr tyr Pro Trp Gin Arg Glu 
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865 870 875 880 

Arg Tyr Trp He Glu Ala Pro Ala Lys Ser Ala Ala Gly Asp Arg Arg 
885 890 895 

Gly Val Arg Ala Gly Gly His Pro Leu Leu Gly Glu Met Gin Thr Leu 
900 90S 910 

Ser Thr Gin Thr Ser Thr Arg Leu Trp Glu Thr Thr Leu Asp Leu Lys 
915 920 925 

Arg Leu Pro Trp Leu Gly Asp His Arg Val Gin Gly Ala Val Val Phe 
930 935 940 

Pro Gly Ala Ala Tyr Leu Glu Met Ala lie Ser Ser Gly Ala Glu Ala 
945 950 955 '960 

Leu Gly Asp Gly Pro Leu Gin lie Thr. Asp Val Val Leu Ala Glu Ala 
965 970 975 

Leu Ala Phe Ala Gly Asp Ala Ala Val Leu Val Gin Val Val Thr Thr 
980 985 990 

Glu Gin Pro Ser Gly Arg Leu Gin Phe Gin He Ala Ser Arg Ala Pro 
995 1000 1005 

Gly Ala Gly His Ala Ser Phe Arg. Val His Ala Arg Gly Ala Leu Leu 
1010 1015 1020 

Arg Val Glu Arg Thr Glu Val Pro Ala Gly Leu Thr Leu Ser Ala Val 
1025 1030 .1035 1040 

Arg Ala Arg Leu Gin Ala Ser He Pro Ala Ala Ala Thr Tyr Ala Glu 
1045 1050 1055 

Leu Thr Glu Met Gly Leu Gin Tyr Gly Pro Ala Phe Gin Gly lie Ala 
1060 1065 1070 

Glu Leu Trp Arg Gly Glu Gly Glu Ala Leu Gly Arg Val Arg Leu Pro 
1075 1080 " 1085 

Asp Ala Ala Gly Ser Ala Ala Glu Tyr Arg Leu His Pro Ala Leu Leu 
1090 1095 1100 

Asp Ala Cys Phe Gin lie Val Gly Ser Leu Phe Ala Arg Ser Gly Glu 
1105 1110 1115 1120 

Ala Thr Pro Trp Val Pro Val Glu Leu Gly Ser Leu Arg Leu Leu Gin 
1125 1130 U35 

Arg Pro Ser Gly Glu Leu Trp Cys His Ala Arg Val Val Asn His Gly 
1140 1145 1150 

His Gin Thr Pro Asp Arg Gin Gly Ala Asp Phe Trp Val Val Asp Ser 
1155 1160 1165 

Ser Gly Ala Val Val Ala Glu Val Cys Gly Leu Val Ala Gin Arg Leu 
1170 1175 1180 

Pro Gly Gly Val Arg Arg Arg Glu Glu Asp Asp Trp Phe Leu Glu Leu 
1185 1190 1195 1200 

Glu Trp Glu Pro Ala Ala Val Gly Thr Ala Lys Val Asn Ala Gly Arg 
1205 1210 1215 



WO 99/66028 PCT/EP99/04171 



-31 - 



Tip Leu Leu Leu Gly Gly Gly Gly Gly Leii Gly Ala Ala Leu Arg Ala 
1220 1225 1230 

Met Leu Glu Ala Gly Gly His Ala Val Val His Ala Ala Glu Asn Asn 
1235 1240 1245 

Thr Ser Ala Ala Gly Val Arg Ala Leu Leu Ala Lys Ala Phe Asp Gly 
1250 1255 1260 

Gin Ala Pro Thr Ala Val Val His Leu Gly Ser Leu Asp Gly Gly Gly 
1265 1270 1275 1280 

Glu Leu Asp Pro Gly Leu Gly Ala Glh Gly Ala Leu Asp Ala Pro Arg 
1285 1290 1295 

Ser Ala Asp Val Ser Pro Asp Ala Leu Asp Pro Ala Leu Val Arg Gly 
1300 . 1305 . 1310 

Cys Asp Ser Val Leu Trp Thr Val Gin Ala Leu Ala Gly Met Gly Phe 
1315 , 1320 . 1325 

Arg Asp Ala Pro Arg Leu Trp Leu Leu Thr Arg Gly Ala Gin Ala Val 
1330 1335 1340 

Gly Ala Gly Asp Val Ser Val Thr Gin Ala Pro Leu Leu Gly Leu Gly 
1345 1350 1355 1360 

Arg Val lie Ala Met Glu His Ala Asp Leu Arg Cys Ala Arg Val Asp 
1365 1370 . 1375 

Leu Asp Pro Ala Arg Pro Glu Gly Glu Leu Ala Ala Leu Leu Ala Glu 
1380 1385 1390 

Leu Leu Ala Asp Asp Ala Glu Ala Glu Val Ala Leu Arg Gly Gly Glu 
1395 ■ 1400 1405 

Arg Cys Val Ala Arg lie Val Arg Arg Gin Pro Glu Thr Arg Pro Arg 
. .1410 1415 1420 

Gly Arg lie Glu Ser Cys Val Pro Thr Asp Val Thr lie Arg Ala Asp 
1425 . 1430 1435 1440 

Ser Thr Tyr Leu Val Thr Gly Gly Leu Gly Gly Leu Gly Leu Ser Val . 

1445 1450 1455 

Ala Gly - Trp Leu Ala Glu Arg Gly Ala Gly His Leu Val Leu Val Gly 
1460 1465 1470 

Arg Ser Gly Ala Ala Ser Val Glu Gin Arg Ala Ala Val Ala Ala Leu 
1475 1480 1485 

Glu Ala Arg Gly Ala Arg Val Thr Val Ala Lys Ala Asp Val Ala Asp 
1490 1495 . 1500 

Arg Ala Gin Leu Glu Arg He Leu Arg Glu Val Thr Thr Ser Gly Met 
1505 1510- 1515 1520 

Pro Leu Arg Gly Val Val His Ala Ala Gly lie Leu Asp Asp Gly Leu 
1525 ' 1530. 1535 

Leu Met Gin Glh Thr Pro Ala Arg Phe Arg Lys Val Met Ala Pro Lys 
1540 1545 1550 

Val Gin Gly Ala Leu His Leu His Ala Leu Thr Arg Glu Ala Pro Leu 
. 1555 1560 1565 
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Ser Phe Phe Val Leu Tyr Ala Ser Gly Val Gly Leu Leu Gly Ser Pro 
1570 1575 1580 

Gly Gin Gly Asn Tyr Ala Ala Ala Asn Thr Phe Leu Asp Ala Leu Ala 
1585 1590 1595 1600 

His His Arg Arg Ala Gin Gly Leu Pro Ala Leu Ser Val Asp Trp Gly 
1605 1610 1615 

Leu Phe Ala Glu Val Gly Met Ala Ala Ala Gin Glu Asp Arg Gly Ala 
1620 1625 1630 

Arg Leu Val Ser Arg Gly Met Arg Ser Leu Thr Pro Asp Glu Gly Leu 
.1635 1640 1645 

Ser Ala Leu Ala Arg Leu Leu Glu Ser Gly Arg Ala Gin Val Gly Val 
1650 1655 ■ 1660 

Met Pro Val Asn Pro Arg Leu Trp Val Glu Leu Tyr Pro Ala Ala Ala 
1665 1670 1675 1680 

Ser Ser Arg Met Leu Ser Arg Leu Val Thr Ala His Arg Ala Ser Ala 
1685 1690 1695 

Gly Gly Pro Ala Gly Asp Gly Asp Leu Leu Arg Arg Leu Ala Ala Ala 
, 1700 1705 1710 

Glu Pro Ser Ala Arg Ser Ala Leu Leu Glu Pro Leu Leu Arg Ala Gin 
1715 1720 1725 

lie Ser Gin Val Leu Arg Leu Pro Glu Gly Lys lie Glu Val Asp Ala 
1730 1735 1740 

Pro Leu Thr Ser Leu Gly Met Asn Ser Leu Met Gly Leu Glu Leu Arg 
1745 1750 1755 1760 

Asn Arg lie Glu Ala Met Leu Gly He Thr Val Pro Ala Thr Leu Leu 
1765 1770 1775 

Trp Thr Tyr Pro Thr Val Ala Ala Leu Ser Gly His Leu Ala Arg Glu 
1780 1785 1790 

Ala Cys Glu Ala Ala Pro Val Glu Ser Pro His Thr Thr Ala Asp Ser 
1795 1800 1805 

Ala Val Glu He Glu Glu Met Ser Gin Asp Asp Leu Thr Gin Leu lie 
1810 1815 1820 

Ala Ala Lys Phe Lys Ala Leu Thr 
1825 1830 



<210> 5 
<211> 7257 
<212> PP.T 

<213> Sorangium cellulosum 
<400> 5 

Met Thr Thr Arg Gly Pro Thr Ala Gin Gin Asn Pro Leu Lys Gin Ala 
1 5 10 15 



Ala He He He 
20 



Gin Arg Leu Glu Glu Arg Leu Ala Gly Leu Ala Gin 
25 30 
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Ala Glu Leu Glu Arg Thr Glu Pro lie Ala He Val Gly lie Gly Cys 
35 40 45 

Arg Phe Pro Gly Gly Ala Asp Ala Pro Glu Ala Phe Trp Glu Leu Leu 
50 55 . 60 

Asp Ala Glu Arg Asp Ala Val Gin Pro Leu Asp Met Arg Trp Ala Leu 
65 70 75 80 

Val Gly Val Ala Pro Val Glu Aia Val Pro His Trp Ala Gly Leu Leu 
85 90 95 

Thr Glu Pro He Asp Cys Phe Asp Ala Ala Phe Phe Gly He Ser Pro 
100 105 110 

Arg Glu Ala Arg Ser Leu Asp Pro Gin His Arg Leu Leu Leu Glu Val 
115 ■ - 120 125 

Ala Trp Glu Gly Leu Glu Asp Ala Gly lie Pro Pro Arg Ser He Asp 
130 135 140 

Gly Ser Arg Thr Gly Val Phe Val Gly Ala Phe Thr . Ala Asp Tyr Ala 
145 .150 155 160 

Arg Thr Val Ala Arg Leu Pro Arg Glu Glu Arg Asp Ala Tyr Ser Ala 
165 170 175 

Thr Gly Asn Met Leu Ser He Ala Ala Gly Arg Leu Ser Tyr Thr Leu 
180 . .185 190 

Gly Leu Gin Gly Pro Cys Leu Thr Val Asp Thr Ala Cys Ser Ser Ser 
195 200 . 205 

Leu Val Ala lie His Leu Ala Cys Arg Ser Leu Arg Ala Gly Glu Ser 
210- 215 220 

Asp Leu Ala Leu Ala Gly Gly Val Ser Ala Leu Leu Ser Pro Asp Met 
225 230 235 240 

Met Glu Ala Ala Ala Arg Thr Gin Ala Leu Ser Pro Asp Gly Arg Cys 
245 250 255 

Arg Thr Phe Asp Ala Ser Ala Asn Gly Phe Val Arg Gly Glu Gly Cys 
260 265 270 

Gly Leu Val Val Leu Lys Arg. Leu Ser Asp Ala Gin Arg Asp Gly Asp 
275 280 285 

Arg lie Trp Ala Leu lie Arg .Gly Ser Ala He Ash His Asp Gly Arg 
290 295 300 

Ser Thr Gly Leu Thr Ala Pro Asn Val Leu Ala Gin Glu Thr Val Leu 
305 310 315 320 

Arg Glu Ala Leu Arg Ser Ala His Val Glu Ala Gly Ala Val Asp Tyr 
325 330 335 

Val Glu Thr His Gly Thr Gly Thr Ser Leu Gly Asp Pro He Glu Val 
340 345 350 

Glu Ala Leu Arg Ala Thr Val Gly Pro Ala Arg Ser Asp Giy Thr Arg 
355 360 ■ 365 

Cys Val Leu Gly Ala Val Lys Thr Asn He Gly His Leu Glu Ala Ala 
370 375 380 
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' Ala Gly Val Ala Gly Leu lie Lys Ala Ala Leu Ser Leu Thr His Glu 
385 390 395 400 

Arg lie Pro Arg Asn Leu Asn Phe Arg Thr Leu Asn Pro Arg He Arc 
405 410 415 

Leu Glu Gly Ser Ala Leu Ala Leu Ala Thr Glu Pro Val Pro Trp Pro 
420 425 430 

Arg Thr Asp Arg Pro Arg Phe Ala Gly Val Ser Ser Phe Gly Mec Ser 
435 440 445 

Gly Thr Asn Ala His Val Val Leu Glu Glu Ala Pro Ala Val Glu Leu 
450 455 460 

Trp Pro Ala Ala Pro Glu Arg Ser Ala Glu Leu Leu Val Leu Ser Gly 
465 470 475 480 

Lys Ser Glu Gly Ala Leu Asp Ala Gin Ala Ala Arg Leu Arg Glu His 
485 490 495 

Leu Asp Met His Pro Glu Leu Gly Leu Gly Asp Val Ala Phe Ser Leu 
.500 505 510 

Ala Thr Thr Arg Ser Ala Met Ser His Arg Leu Ala Val Ala Val Thr 
515 520 525 

Ser Arg Glu Gly Leu Leu Ala Ala Leu Ser Ala Val Ala Gin Gly Gin 
530 535 540 

Thr Pro Ala Gly Ala Ala Arg Cys He Ala Ser Ser Ser Arg Gly Lys 
545 550 555 560 

Leu Ala Phe Leu Phe Thr Gly Gin Gly Ala Gin Thr Pro Gly Met Gly 
565 570 575 

Arg Gly Leu Cys Ala Ala Trp Pro Ala Phe Arg Glu Ala Phe Asp Arg 
580 . 585 590 

Cys Val Ala Leu Phe Asp Arg Glu Leu Asp Arg Pro Leu Arg Glu Val 
595 600 605 

Met Trp Ala Glu Ala Gly Ser Ala Glu Ser Leu Leu Leu Asp Gin Thr 
610 615 620 

Ala Phe Thr Gin Pro Ala Leu Phe Ala Val Glu Tyr Ala Leu Thr Ala 
625 630 635 640 

Leu Trp Arg Ser Trp Gly Val Glu Pro Glu Leu Leu Val Gly His Ser 
645 650 655 

He Gly Glu Leu Val Ala Ala Cys Val Ala Gly Val Phe Ser Leu Glu 
660 665 670 

Asp Gly Val Arg Leu Val Ala Ala Arg Gly Arg Leu Met Gin Gly Leu 
675 • 680 685 

Ser Ala Gly Gly Ala Met Val Ser Leu Gly Ala Pro Glu Ala Glu Val 
690 695 700 

Ala Ala Ala Val Ala Pro His Ala Ala Ser Val Ser lie Ala Ala Val 
705 710 715 720 

Asn Gly Pro Glu Gin Val Val He Ala Gly Val Glu Gin Ala Val Gin 
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725 730 735 

Ala lie Ala Ala Gly Phe Ala Ala Arg Gly Ala Arg Thr Lys Arg Leu 
740 745 . 750 

His Val Ser His Ala Phe His Ser Pro Leu Met Glu Pro Met Leu Glu 
755. 760 765 

Glu Phe Gly Arg Val Ala Ala Ser Val Thr Tyr Arg Arg Pro Ser Val 
770 . 775' 780 

Ser Leu Val Ser Asn Leu Ser Gly Lys Val Val Thr Asp Glu Leu Ser 
.785 790 795 800 

Ala Pro Gly Tyr Trp Val Arg His Val Arg Glu Ala Val Arg Phe Ala 
805 810 815- 

Asp Gly Val Lys Ala Leu His Glu Ala Gly Ala Gly Thr Phe Val Glu 
820 .825 830 

Val Gly Pro Lys Pro Thr Leu Leu Gly Leu Leu Pro Ala Cys Leu Pro 

. 835 . • . 840 845 

Glu Ala Glu Pro Thr Leu Leu Ala Ser Leu Arg Ala Gly Arg Glu Glu 
850 855 860 

Ala Ala Gly Val Leu Glu Ala Leu Gly Arg Leu Trp Ala Ala Gly Gly 
865 . 870 875 880 

Ser Val Ser Trp Pro Gly Val Phe Pro Thr Ala Gly Arg Arg Val Pro 
885 890 895 

Leu Pro Thr Tyr Pro Trp Gin Arg Gin Arg Tyr Trp lie Glu Ala Pro 
900 905 910 

Ala Glu Gly Leu Gly Ala Thr Ala Ala Asp Ala Leu Ala Gin Trp Phe 
915 ■ 920 925 

Tyr Arg Val Asp Trp Pro Glu Met Pro Arg Ser Ser Val Asp Ser Arg 
930 935 940 

Arg Ala Arg. Ser Gly Gly Trp Leu Val Leu Ala Asp Arg Gly Gly Val 
945 950 . 955 960 

Gly Glu Ala Ala Ala Aia Ala Leu Ser Ser Gin Gly Cys Ser Cys Ala 
965 970 975 

Val Leu His Ala Pro. Ala Glu Ala Ser Ala Val Ala Glu Gin Val Thr 
, 980 985 . 990 

Gin Ala Leu Gly Gly Arg Asn Asp Trp Gin Gly Val Leu . Tyr Leu Trp 
995 1000 1005 

Gly Leu Asp Ala Val Val Glu Ala Gly Ala. Ser Ala Glu Glu Val Ala 
1010 1015 1020 . 

Lys Val Thr His Leu Ala Ala Ala Pro Val Leu Ala Leu He Gin Ala 
1025 1030 .1035 1040 

Leu Gly Thr Gly Pro Arg Ser Pro Arg Leu Trp He Val Thr Arg Gly 
1045 1050 1055 

Ala Cys Thr Val Gly Gly Glu Pro Asp Ala Ala Pro Cys Gin Ala Ala 
1060 1065 1070 



WO 99/66028 PCT/EP99/04171 

-36- 



Leii Trp Gly Met Gly Arg Val Ala Ala Leu Glu His Pro Gly Ser Tip 
1075 1080 ' 1085 

Gly Gly Leu Val Asp Leu Asp Pro Glu Glu Ser Pro Thr Glu Val Glu 
1090 1095 1100 

Ala Leu Val Ala Glu Leu Leu Ser Pro Asp Ala Glu Asp Gin Leu Ala 
1105 1110 ins 1120 

Phe Arg Gin Gly Arg Arg Arg Ala Ala Arg Leu Val Ala Ala Pro Pro 
1125 1130 1135 

Glu Gly Asn Ala Ala Pro Val Ser Leu Ser Ala Glu Gly Ser Tyr Leu 
1140 H45 H50 

Val Thr Gly Gly Leu Gly Ala Leu Gly Leu Leii Val Ala Arg Trp Leu 
1155 1160 H65 

Val Glu Arg Gly Ala Gly His Leu Val Leu lie Ser Arg His Gly Leu 
ll 70 1175 H80 

P f° ASP Arg Glu GIu Tr P Gly Arg Asp Gin Pro Pro Glu Val Arg Ala 
U85 1190 U95 1200 

Arg lie Ala Ala lie Glu Ala Leu Glu Ala Gin Gly Ala Arg Val Thr 
1205 1210 1215 

Val Ala Ala Val Asp Val Ala Asp Ala Glu Gly Met Ala Ala Leu Leu 
1220 1225 ' 1230 

Ala Ala Val Glu Pro Pro Leu Arg Gly Val Val His Ala Ala Gly Leu 
1235 1240 1245 

Leu Asp Asp Gly Leu Leu Ala His Gin Asp Ala Gly Arg Leu Ala Arg 
1250 1255 1260 

Val Leu Arg Pro Lys Val Glu Gly Ala Trp Val Leu His Thr Leu Thr 
1265 1270 1275 1280 

Arg Glu Gin Pro Leu Asp Leu Phe Val Leu Phe Ser Ser Ala Ser Gly 
1285 1290 1295 

Val Phe Gly Ser lie Gly Gin Gly Ser Tyr Ala Ala Gly Asn Ala Phe 
1300 1305 1310 

Leu Asp Ala Leu Ala Asp Leu Arg Arg Thr Gin Gly Leu Ala Ala Leu 
1315 1320 1325 

Ser lie Ala Trp Gly Leu Trp Ala Glu Gly Gly Met Gly Ser Gin Ala 
1330 1335 1340 

Gin Arg Arg Glu His Glu Ala Ser Gly He Trp Ala Met Pro Thr Ser 
U4S 1350 1355 1360 

Arg Ala Leu Ala Ala Met Glu Trp Leu Leu Gly Thr Arg Ala Thr Gin 
1365 1370 ~ 1375 

Arg Val Val He Gin Met Asp Trp Ala His Ala Gly Ala Ala Pro Arg 
1380 1385 1390 

Asp Ala Ser Arg Gly Arg Phe Trp Asp Arg Leu Val Thr Ala Thr Lys 
1395 1400 1405 

Glu Ala Ser Ser Ser Ala Val Pro Ala Val Glu Arg Trp Arg Asn Ala 
1410 1415 1420 
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Ser Val Val Glu Thr Arg Ser Ala Leu Tyr Glu Leu Val Arg Gly Val 
1425 1430 1435. 1440 

Val Ala Gly Val Met Gly Phe Thr Asp Gin Gly Thr Leu Asp Val Arg 
1445 1450 1455. 

Arg Gly Phe Ala Glu. Gin . Gly Leu As£ Ser Leu Met Ala Val Glu He 
1460 1465 1470 

Arg Lys Arg Leu Gin Gly Glu Leu Gly Met Pro Leu Ser Ala Thr Leu 
1475 , 1480 1485 

Ala Phe Asp His Pro Thr Val Glu Arg: Leu Val Glu Tyr Leu Leu Ser 
1490 1495 1500 

Gin Ala Leu Glu Leu Glh Asp Arg Thr Asp Val Arg Ser Val Arg Leu 
1505 1510 1515 1520 

Pro Ala Thr Glu Asp Pro He Ala lie Val Gly Ala Ala Cys Arg Phe 
1525 1530 1535 

Pro Gly Gly Val Glu Asp Leu Glu Ser Tyr Trp Gin Leu Leu Thr Glu 
1540 1545 • 1550 

Gly Val Val Val Ser Thr Glu Val Pro Ala Asp Arg Trp Asn Gly Ala 
1555 1560 1565 

Asp Gly Arg Val Pro Gly Ser Gly Glu Ala Gin Arg Gin Thr Tyr Val 
1570 1575 1580 

Pro Arg Gly Gly Phe Leu Arg Glu Val Glu Thr Phe Asp Ala Ala Phe 
1585 1590 1595 ■ 1600 

Phe His . He Ser Pro Arg Glu Ala Met Ser Leu Asp Pro Gin Gin Arg 
1605 1610 ■ 1615 . 

Leu Leu Leu Glu Val Ser Trp Glu Ala He Glu Arg Ala Gly Gin Asp 
.1620 1625 1630 

Pro Ser Ala Leu Arg Glu Ser Pro Thr Gly Val Phe Val Gly Ala Gly 
1635 1640 1645 

Pro Asn Glu Tyr Ala Glu Arg Val Gin Glu Leu Ala Asp Glu. Ala Ala 
1650 1655 1660 

Gly Leu Tyr Ser Gly Thr Gly Asn Met Leu Ser Val Ala Ala Gly Arg 
1665 1670 1675 1680 

Leu Ser Phe Phe Leu Gly Leu His Gly Pro Thr Leu Ala Val Asp Thr 
1685 1690 1695 

Ala Cys Ser Ser Ser Leu Val Ala Leu His Leu Gly Cys Gin Ser Leu 
1700 1705 1710 

Arg Arg Gly Glu Cys Asp Gin Ala Leu Val Gly Gly Val Ash Met Leu 
1715. 1720 1725 

Leu Ser Pro. Lys Thr Phe Ala Leu Leu Ser Arg Met His Ala Leu Ser 
1730 1735 1740 

Pro Gly Gly Arg Cys Lys Thr Phe Ser Ala Asp Ala Asp Gly Tyr Ala 
1745 1750 1755 1760 

Arg Ala Glu Gly Cys Ala Val Val Val Leu Lys Arg Leu Ser Asp Ala 
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1765 1770 1775 

Gin Arg Asp Arg Asp Pro lie Leu Ala Val lie Arg Gly Thr Ala lie 
1780 1785 1790 

Ash His Asp Gly Pro Ser Ser Gly Leu Thr Val Pro Ser Gly Pro Ala 
1795 1800 1805 

Gln Glu Ala Leu Leu Arg Gin Ala Leu Ala His Ala Gly Val Val Pro 
1810 1815 1820 

Qtr-V Val Phe Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly 
I 825 1830 1835 1840 

Asp Pro He Glu Val Arg Ala Leu Ser Asp Val Tyr Gly Gin Ala Arg 
1845 1850 * 1855. 

Pro Ala Asp Arg Pro Leu lie Leu Gly Ala Ala Lys Ala Asn Leu Gly 
I860 1865 1870 

His Met Glu Pro Ala Ala Gly Leu Ala Gly Leu Leu Lys Ala Val Leu 
1875 1880 1B85 

Ala ,oo* Gly Gln Glu Gln Ile Pro Ala Gln Pro Glu Leu Gly Glu Leu 
1890 1895 1900 

Asn Pro Leu Leu Pro Trp Glu Ala Leu Pro Val Ala Val Ala Arg Ala 
1905 1910 1915 1920 

Ala Val Pro Trp Pro Arg Thr Asp Arg Pro Arg Phe Ala Gly Val Ser 
1925 1930 1935 

Ser Phe Gly Met Ser Gly Thr Asn Ala His Val Val Leu Glu Glu Ala 
1540 1945 1950 

Pro Ala Val Glu Leu Trp Pro Ala Ala Pro Glu Arg Ser Ala Glu Leu 
1955 i960 1965 

LeU ,Y™ LSU Ser Gly Lys Ser Glu Gly Ala Leu Asp Ala Gln Ala Ala 
1970 1975 1980 

Arg Leu Arg Glu His Leu Asp Met His Pro Glu Leu Gly Leu Gly Asp 
1985 1990 1995 2000 

Val Ala Phe Ser Leu Ala Thr Thr Arg Ser Ala Met Asn His Arg Leu 
2005 2010 2015 

Ala Val Ala Val Thr Ser Arg Glu Gly Leu Leu Ala Ala Leu Ser Ala 
2020 2025 2030 

Val Ala Gln. Gly- Gln Thr Pro Pro Gly Ala Ala Arg Cys Ile Ala Ser 
2035 2040 2045 

Ser Ser Arg Gly Lys Leu Ala Phe Leu Phe Thr Gly Gln Gly Ala Gln 
2050 2055 2060 

Thr Pro Gly Met Gly Arg Gly Leu Cys Ala Ala Trp Pro Ala Phe Arg 
2065 2070 2075 2080 

Glu Ala Phe Asp Arg Cys Val Ala Leu Phe Asp Arg Glu Leu Asp Arg 
2085 2090 2095 

Pro Leu Arg Glu Val Met Trp Ala Glu Pro Gly Ser Ala Glu Ser Leu 
2100 2105 2110 
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Leu Leu Asp Gin Thr Ala Phe Thr Gin Pro* Ala Leu Phe Thr Val Glu 
2115 2120 " " 2125 

Tyr Ala Leu Thr Ala Leu Trp Arg Ser Trp Gly Val Glu Pro Glu Leu 
2130 2135 2140 .. 

Val Ala Gly His Ser Ala Gly Glu Leu Val Ala Ala Cys Val Ala Gly 
2145 2150 2155 2160 

Val Phe Ser Leu Glu Asp Gly Val Arg Leu Val Ala Ala Arg Gly Arg 
2165 • 2170 2175 

Leu Met Gin Gly Leu Ser Ala Gly Gly Ala Met Val Ser Leu Gly Ala 
2180 2185 2190 

Pro Glu Ala Glu Val Ala Ala Ala Val Ala Pro His Ala Ala Ser Val 
2195 ' .2200 2205 

Ser lie Ala Ala Val Asn Gly Pro Glu Gin Val Val He Ala Gly Val 
. 2210 . 2215 . 2220 

Glu Gin Ala Val Gin Ala lie Ala Ala Gly Phe Ala Ala Arg Gly Ala 
2225 2230 2235 2240 

Arg Thr Lys Arg Leu His Val Ser His Ala Ser His Ser Pro Leu Met 
2245 2250 2255 

Glu Pro Met Leu Glu Glu Phe Gly Arg Val Ala Ala Ser Val Thr Tyr 
2260 2265 . 2270 

Arg Arg Pro Ser Val Ser Leu. Val Ser Asn Leu Ser Gly Lys Val Val 
2275 2280 2285 

Ala Asp Glu Leu Ser Ala Pro Gly Tyr Trp Val Arg His Val Arg Glu 
2290 2295 2300 

Ala Val Arg Phe Ala Asp Gly Val Lys Ala Leu His Glu Ala Gly Ala 
2305 . . 2310 . 2315 . 2320 

Gly Thr Phe Val Glu Val Gly Pro Lys Pro Thr Leu Leu Gly Leu Leu 
2325 2330 2335 

Pro Ala Cys Leu Pro Glu Ala Glu Pro Thr Leu Leu Ala Ser Leu Arg 
2340 2345 2350 

Ala Gly Arg Glu Glu Ala Ala Gly Val Leu Glu Ala Leu Gly Arg Leu 
2355 2360 2365 

Trp Ala Ala Gly Gly Ser Val Ser Trp Pro Gly Val Phe Pro Thr Ala 
2370 2375 ■ 2380 

Gly Arg Arg Val Pro Leu Pro Thr Tyr Pro Trp Gin Arg Gin Arg Tyr 
2385 2390 2395 2400 

Trp Pro Asp He Glu Pro Asp Ser Arg Arg His Ala Ala Ala Asp Pro 
2405 2410 2415 

Thr Gin Gly Trp Phe Tyr Arg Val Asp Trp Pro Glu lie Pro Arg Ser 
. 2420 2425 • 2430 

Leu Gin Lys Ser Glu Glu Ala Ser Arg Gly Ser Trp Leu Val Leu Ala 
2435 2440 2445 

Asp Lys Gly Gly Val Gly Giu Ala Val Ala Ala Ala Leu Ser Thr Arg 
2450 2455 2460 
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2465^ ^ ^ Val -- L6U HiS ?ro *± a Glu Thr Ser Thr 



2470 



2475 



2480 



Ala Glu Leu Val Thr Glu Ala Ala Gly Gly Arg Ser Asp Trp Gin Val 

2485 2490 2495 

Val Leu Tyr^Leu Trp Gly Leu Asp Ala Val Val Gly Ala Glu Ala 



2505 



2510 



Ser 



lie As^Glu He Gly Asp Ala Jhr Arg Arg Ala Thr Ala Pro Val Leu 



2520 2525 
Gly^Leu Ala Arg Phe Leujer Thr Val Ser Cysjer Pro Arg Leu Trp 



2540 



gj. Val Thr Arg Gly^Ala Cys lie Val Gly_Asp Glu Pro Ala lie' Ala 



2550 



2555 



2560 



Pro Cys Gin Ala Ala Leu Trp Gly Met Gly Arg Val Ala Ala Leu Glu 

2570 2575 
His Pro Gly Ala Trp Gly Gly Leu Val Asp Leu Asp Pro Arg Ala Ser 



2585 



2590 



Pro Pro^Gln Ala Ser Pro He^Asp Gly Glu Met Leu Val Thr Glu Leu 



2600 



2605 



Leaser Gin Glu Thr Gl^Asp Gin Leu Ala Phe^Arg His Gly Arg Arg 



2620 



His Ala Ala Arg Leu^Val Ala Ala Pro Pro Gin Gly Gin Ala Ala Pro 
2630 2 «5 2640 

Val ser Leu Serbia Glu Ala Ser Tyr Leu Val Thr Gly Gly Leu G iy 

2645 2650 2655 

Gly Leu Gly Leu lie Val Ala Gin Trp Leu Val Glu Leu Gly Ala Arg 



2665 



2670 



His Leu Val Leu Thr Ser Arg Arg Gly Leu Pro Asp Arg Gin Ala Trp 

2680 2685 

Cys Glu Gin 
2690 



Ala Leu Glu 
2705 

Ala Asp Val 



Gin Pro Pro Glu lie Arg Ala Arg He Ala Ala Val Glu 
2695 2700 

Ala Arg^Gly Ala Arg Val Thr Val Ala Ala Val Asp Val 
2710 2715 2720 

Glu Met Thr Ala Leu Val Ser Ser Val Glu Pro Pro 
2725 2730 2735 

Leu Arg Gly^Val Val His Ala Ala Gly Val Ser Val Met Arg Pro Leu 
2740 2745 2750 

Ala Glu Thr Asp Glu Thr Leu Leu Glu Ser Val Leu Arg Pro Lys Val 



2760 



2765 



SSr TrP LeU ^ "if Arg Leu Leu His Gly Arg Pro Leu Asp 
2775 2780 

Leu^Phe Val Leu Phe^ Ser Ser Gly Ala Ala Val Trp Gly Ser His Ser 



2790 



2795 2800 
Gin Gly Ala Tyr Ala Ala Ala Asn Ala Phe Leu Asp Gly Leu Ala His 
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2805 2810 2815 

Leu Arg Arg Ser Gin Ser Leu Pro Ala Leu Ser Val Ala Trp Gly Leu 
2820 2825 .. 2830 

trp Ala. Glu Gly Gly Met Ala Asp Ala Glu Ala His Ala Arg Leu Ser 
2835 2840 2845 ... 

Asp lie Gly Val Leu Pro Met Ser Thr Ser Ala Ala Leu Ser Ala Leu 
2850 2855 2860 

Gin Arg Leu Val Glu Thr Gly Ala Ala Gin Arg Thr Val Thr Arg Met 
2865 2870 2875 2880 

Asp Trp Ala Arg Phe Ala Pro Val Tyr Thr Ala Arg Gly Arg Arg Asn 
. 2885 . ; 2890 2895; 

Leu Leu Ser Ala Leu Val Ala Gly Arg Asp lie He Ala Pro Ser Pro 
2900 2905 2910 

Pro Ala Ala Ala Thr Arg Asn Trp Arg Gly Leu Ser Val Ala Glu Ala 
2915 2920 2925 

Arg Val Ala Leu His Glu He Val His Gly Ala Val Ala Arg Val Leu 
2930 . 2935 2940 

Gly Phe Leu Asp Pro Ser Ala Leu Asp Pro Gly Met Gly Phe Asn Glu 
2945 2950 2955 2960 

Gin Gly Leu Asp Ser Leu Met Ala Val Glu He Arg Asn Leu Leu Gin 
2965 2970 2975 

Ala Glu Leu Asp Val Arg Leu Ser Thr Thr Leu Ala Phe Asp His Pro 
2980 2985 2990 

Thr Val Gin Arg Leu Val Glu His Leu Leu Val Asp Val Leu Lys Leu 
. 2995 3000 3005 

Glu Aso Arg Ser Asp Thr Gin His Val Arg Ser Leu Ala Ser Asp Glu 
3010 3015 3020 

Pro lie Ala . lie Val Gly Ala Ala Cys Arg Phe Pro Gly Gly Val Glu 
3025 3030 . ' 3035 3040 

Asp Leu Glu Ser Tyr Trp Gin Leu Leu Ala Glu Gly Val Val Val Ser 
. 3045 3050 3055 

Ala Glu Val Pro Ala . Asp Arg Trp Asp Ala Ala Asp Trp Tyr Asp Pro 
3060 3065 3070 

Asp Pro Glu He Pro Gly Arg Thr Tyr Val Thr Lys Gly Ala Phe Leu 
3075 3080 3085 

Arg Asp Leu Gin Arg Leu Asp Ala Thr Phe Phe Arg He Ser Pro Arg 
3090 - 3095 ' 3100 . . 

Glu Ala Met Ser Leu Asp Pro Gin Glh Arg Leu Leu Leu Glu Val Ser 
3105 , 3110 3115 3120 

Trp Glu Ala Leu Glu Ser Ala Gly He Ala Pro Asp Thr Leu Arg Asp 
3125 . 3130 3135 

Ser Pro Thr Gly Val Phe Val Gly Ala Gly Pro Asn Glu Tyr Tyr Thr 
3140 - 3145 3150 
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Gin Arg Leu Arg Gly Phe Thr Asp Gly Ala. Ala Gly Leu Tyr Gly Gly 
JX = 3 3160 3165 

Thr Gly Asn Met Leu Ser Val Thr Ala Gly Arg Leu Ser Phe Phe Leu 
. • 3175 3180 

Gly Leu His Gly Pro Thr Leu Ala Met Asp Thr Ala Cys Ser Ser Ser 
■ * • 3190 3195 3200 

Leu Val Ala Leu His Leu Ala Cys Gin Ser Leu Arg Leu Gly Glu Cys 
-" 0b 3210 3215 

Asp Gin Ala Leu Val Gly Gly Val Asn Val Leu Leu Ala Pro Glu Thr 
3220 3225 3230 

Phe Val Leu Leu Ser Arg Met Arg Ala Leu Ser Pro Asp Gly Arg Cys 
•>•":> 3240 3245 

Lys Thr Phe Ser Ala Asp Ala Asp Gly Tyr Ala Arg Gly Glu Gly Cys 

3255 326O 

Ala Val val Val Leu Lys Arg Leu Arg Asp Ala Gin Arg Ala Gly Asp 
3270 3 275 '3280 

Ser lie Leu Ala Leu He Arg Gly Ser Ala Val Asn His Asp Gly Pro 
J285 3290 3295 

Ser Ser Gly Leu Thr Val Pro Asn Gly Pro Ala Gin Gin Ala Leu Leu 
JJ0 ° 3305 3310 

Arg Gin Ala Leu Ser Gin Ala Gly Val Ser Pro Val Asp Val Asp Phe 
JJJ -3 3320 3325 

Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly Asp Pro lie Glu Val 
JJJU 3335 3340 

Gin Ala Leu Ser Glutei Tyr Gly Pro Gly Arg Ser Gly Asp Arg Pro 
JJ45 3350 3355 336O 

Leu Val Leu Gly^Ala Ala Lys Ala Asn Val Ala His Leu Glu Ala Ala 
3365 3370 3375 

Ser Gly LeU Ala Ser Leu Leu Lys Ala Val Leu Ala Leu Arg His Glu 
3380 3385 3390 

Gin He_Pro Ala Gin Pro Glu Leu Gly Glu Leu Asn Pro His Leu Pro 
3395 3400 34 0 5 

TrP 3«n Thr ^ Pr ° Val , A }f Val Pro *** h * s Ala Va l P"> Trp Gly 

3415 3420 

Arg Gi y Aia ^ Pro Arg Arg Ala Gly Val Ser Ala Phe Gly Leu Ser 
3430 3435 3440 

Gly Thr Asn Val His Val Val Leu Glu Glu Ala Pro Glu Val Glu Pro 
3445 3450 3455 

Ala Pro Ala Ala Pro Ala Arg Pro Val Glu Leu Val Val Leu Ser Ala 
3460 3465 3470 

Lys ser Ala Ala Ala Leu Asp Ala Ala Ala Ala Arg Leu Ser Ala His 
J475 3480 3485 

Lel \?on Ma HiS Pr ° Glu L «» s er Leu Gly Asp val Ala Phe Ser Leu 
30 3495 3500 
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Ala Thr Thr Arg Ser Pro Met Glu His Arg Leu Ala lie Ala Thr. Thr 
3505 3510 . 3515 3520 

Ser Arg Glu Ala Leu Arg Gly Ala Leu Asp Ala Ala Ala Gin Gin Lys 
3525 3530 3535 

Thr Pro Gin Gly Ala Val Arg Gly Lys Ala Val Ser Ser Arg Gly Lys 
3540 3545 3550 

Leu Ala Phe Leu Phe Thr Gly Gin Gly Ala Gin Met Pro Gly Met Gly 
3555 3560 3565 

Arg Gly Leu Tyr Glu Thr Trp Pro Ala Phe Arg Glu Ala Phe Asp Arg . 
.3570 3575 3580 . . 

Cys Val Ala Leu Phe Asp Arg Glu lie Asp Gin Pro Leu Arg Glu Val 
3585 3590 ■. 3595 . 3600 

Met Trp Ala Ala Pro Gly Leu Ala Gin Ala Ala Arg Leu Asp Gin Thr 
3605 . 3610 3615 

Ala Tyr Ala Gin Pro Ala Leu Phe Ala Leu Glu Tyr Ala Leu Ala Ala 
3620 3625 3630 

Leu Trp Arg Ser Trp Gly Val Glu Pro His Val Leu Leu Gly His Ser 
3635: 3640 3645 

He Gly Glu Leu Val Ala Ala Cys Val Ala Gly Val Phe Ser Leu Glu 
3650 3655 3660 

Asp Ala Val Arg Leu Val Ala Ala Arg Gly Arg Leu Met Gin Ala Leu 
3665 3670 3675 3680 

Pro Ala Giy Gly Ala Met Val Ala lie Ala Ala Ser Glu Ala Glu Val 
3685 3690 3695 

Ala Ala Ser Val Ala Pro His Ala Ala Thr Val Ser He Ala Ala Val 
.3700 . 3705 3710 * 

Asn Gly Pro Asp Ala Val Val lie Ala Gly Ala Glu Val Gin Val Leu 
3715 3720 3725 

Ala Leu Gly Ala Thr Phe Ala Ala Arg Gly He Arg Thr Lys Arg Leu 
3730 3735 3740 

Ala Val Ser His Ala Phe His Ser Pro Leu Met Asp Pro Met Leu Glu 
3745 3750 3755 3760 

Asp Phe Gin Arg Val Ala Ala Thr He Ala Tyr Arg Ala Pro Asp Arg 
3765 3770 3775 

Pro Val Val Ser Asn Val Thr Gly His Val Ala Gly Pro Glu lie Ala 
3780 3785 3790 

Thr Pro Glu Tyr Trp Val Arg His Val Arg Ser Ala Val Arg Phe Gly 
3795 3800 ~° 3805 

Asp Gly Ala Lys Ala Leu His Ala Ala Gly Ala Ala Thr Phe Val Glu 
3810 3815 3820 

Val Gly Pro Lys Pro Val Leu Leu Gly Leu Leu Pro Ala Cys Leu Gly 
3825 3830 . . 3835 3840 

Glu Ala Asp Ala Val Leu Val Pro Ser Leu Arg Ala Asp Arg Ser Glu 
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3845 3850 



3855 



Cys Glu Val Val Leu Ala Ala Leu Gly Ala Trp Tyr Ala Trp Gly Gly 

3865 3870 

Ala Leu Asp Trp Lys Gly Val Phe Pro Asp Gly Ala Arg Arg Val Ala 

3880 3Q85 

Leu Pro Met Tyr Pro Trp^Gln Arg Glu Arg His Trp Met Asp Leu Thr 
° 3895 3900 

Pra Arg Ser Ala Ala^Pro Ala Gly lie Ala^Gly Arg Trp Pro Leu Ala 
J * 1U 3915 3920 

Gly Val Gly Leu Cys Met Pro Gly Ala Val Leu His His Val Leu Ser 
3925 3930 3935 

lie Gly Pro Arg His Gin Pro Phe Leu Gly Asp His Leu Val Phe Gly 
u 3945 3950 

Lys Val val Val Pro Gly Ala Phe His Val Ala Val He Leu Ser Ue 

3960 3955 

Ala Ala Glu Arg Trp Pro Glu Arg Ala lie Glu Leu Thr Gly Val Glu 

jy/ - > 3980 

Ph^Leu Lys Ala lie Ala Met Glu Pro Asp Gin Glu Val Glu Leu His 

■ 399S 4000 

Ala Val Leu Thr Pro Glu Ala Ala Gly Asp Gly Tyr Leu Phe Glu Leu 
05 4010 4C15 

Ala Thr Leu^Ala Ala Pro Glu Thr Glu Arg Arg Trp Thr Thr His Ala 
4020 4025 4030 

Arg Gly Arg Val Gin Pro Thr Asp Gly Ala Pro Gly Ala Leu Pro Arg 
SUJ3 4040 4045 

LeU 4050 ^ ° 1U ASP .£f MS 116 Gln Pro Leu ph « Ala Gly 

4055 406O 

Ph^Leu Asp Arg Leu^ Ser Ala Val Arg lie Gly Trp Gly Pro Leu Trp 
*° /u 4075 4080 

Arg Trp Leu Gln^ Asp Gly Arg Val Gly Asp Glu Ala Ser Leu Ala Thr 
4085 4090 4095 

Leu Val Pro Thr Tyr Pro Asn Ala His Asp Val Ala Pro Leu His Pro 
4100 4105 4iio 

lie Leu Leu Asp Asn Gly Phe Ala Val Ser Leu Leu Ser Thr Arg Ser 

4120 4125 

G1U 4?™ G1U ** P *** Gly . p « Leu Pro Phe Ala Val Glu Arg 

4135 4140 

Val Arg Trp Trp Arg Ala Pro Val Gly Arg Val Arg Cys Gly Gly Val 
4150 4155 4160 

Pro Arg Ser Gin Ala Phe Gly Val Ser Ser Phe Val Leu Val Asp Glu 
4165 4170 4!75 

Thr Gly Glu yal Val Ala Glu Val Glu Gly Phe Val Cys Arg Arg Ala 
4180 4185 4190 
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Pro Arg Glu Val Phe Leu Arg Gin Glu Ser Gly Ala Ser Thr Ala Ala 
4195 4200 4205 

Leu Tyr Arg Leu Asp trp Pro Glu Ala Pro Leu Pro Asp Ala Pro Ala 
4210 4215 4220 

Glu Arg lie Glu Glu Ser Trp Val Val Val Ala Ala Pro Gly Ser Glu 
4225 4230 4235 4240 

Met Ala Ala Ala Leu Ala Thr Arg Leu Asn Arg Cys Val Leu Ala Glu 
■ 4245 4250. • 4255 

Pro Lys Gly . Leu Glu Ala Ala Leu Ala Gly Val Ser Pro Ala Gly Val 
• 4260 4265 4270 

He Cys Leu Trp Glu Ala Gly Ala His Glu Glu Ala Pro Ala Ala .Ala 
4275 4280. 4285 

Gin Arg Val Ala Thr Glu Gly Leu Ser Val Val Gin Ala Leu Arg Asp 
4290 , .4295 4300 

Arg Ala Val Arg Leu Trp Trp Val Thr Met Gly Ala Val Ala Val Glu 
4305 4310, 4315 4320 

Ala Gly Glu Arg Val Gin Val Ala Thr Ala Pro Val Trp Gly Leu Gly 
4325 4330 4335 

Arg Thr Val Met Gin Glu Arg Pro Glu Leu Ser Cys Thr Leu Val Asp 
4340 4345 ' 4350 

Leu Glu Pro Glu Ala Asp Ala Ala Arg Ser Ala Asp Val Leu Leu Arg 
4355 4360 . 4365 

Glu Leu Gly Arg Ala Asp Asp Glu Thr Gin Val Ala Phe. Arg Ser Gly 
4370 4375 4380 

Lys Arg Arg Val. Ala Arg Leu Val Lys Ala Thr Thr Pro Glu Gly Leu 
.4385 4390 4395 4400 

Leu Val Pro Asp Ala Glu Ser Tyr Arg Leu Glu Ala Gly Gin Lys Gly 
4405 4410 4415 

Thr Leu Asp Gin Leu Arg Leu Ala Pro Ala Gin Arg Arg Ala Pro Gly 
4420 4425 4430 

Pro Gly Glu Val Glu lie Lys Val Thr Ala Ser Gly Leu Asn Phe Arg 
4435 4440 4445 . 

Thr Val Leu Ala Val Leu Gly Met Tyr Pro Gly Asp Ala Gly Pro Met 
4450 . 4455. 4460 

Gly Gly Asp Cys Ala Gly Val Ala Thr Ala Val Gly Gin Gly Val Arg 
4465 . 4470 4475 4480 

His Val Ala Val Gly Asp Ala Val Met Thr Leu Gly Thr Leu His Arg. 

4485 4490 4495 

Phe Val Thr Val Asp Ala Arg Leu Val Val Arg Gin Pro Ala Gly Leu 
4500 4505 .4510 

Thr. Pro Ala Gin Ala Ala Thr Val Pro Val Ala Phe Leu Thr Ala Trp 
4515 4520 4525 

Leu Ala Leu His Asp Leu Gly Asn Leu Arg Arg Gly Glu Arg Val Leu 
4530 4535 4540 
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Ile His Ala Ala Ala Gly Gly Val Gly Met Ala Ala Val Gin lie Ala 
4550 4555 4 5 60 

Arg Trp lie Gly Ala Glu Val Phe Ala Thr Ala Ser Pro Ser Lys Trp 
4565 4570 4575 P 

Ala Ala Val Gin Ala Met Gly Val Pro Arg Thr His lie Ala Ser Ser 
a3U0 458S 4590 

*** Thr ^ G1U PhS Ala G1U Thr Phe Gln Va l Thr Gly Gly Arg 
4595 4600 4605 

^So ^ ^ LeU 46ts ^ ^ Gly -- PhC V&1 Ala 



4620 



Ser Leu Ser Leu Leu Ser Thr Gly Gly Arg Phe Leu Glu Met Gly Lys 
4625 4630 4635 4640 

Thr Asp lie Arg^Asp Arg Ala Ala Val Ala Ala Ala His Pro Gly Val 
4645 4650 4655 

Arg Tyr Arg Val Phe Asp He Leu Glu Leu Ala Pro Asp Arg Thr Arg 
* oou 4665 4670 

Glu He Leu Glu Arg Val Val Glu Gly Phe Ala Ala Gly His Leu Arg 
"'5 4680 4685 

Ala 46lo ^ AXa ^o! Ala 116 Thr Lys Ala Glu Ala Ala Phe 
HOi,{i 4695 4700 

Arg Phe Met Ala Gin Ala Arg His Gin Gly Lys Val Val Leu Leu Pro 
4 05 4710 471S 4720 

Ala Pro ser Ala Ala Pro Leu Ala Pro Thr Gly Thr Val Leu Leu Thr 
4725 4730 4735 

Gly Gly Leu Gly Ala Leu Gly Leu His Val Ala Arg Trp Leu Ala Gin 
"* ' 4U 4745 4750 

Gin Gly Val Pro His Met Val^Leu Thr Gly Arg Arg Gly Leu Asp Thr 

4760 4765 

PrP 4 G 70 Ala Ala LyS Ala ^ 3 i Ala Glu 116 Glu Al* Leu Gly Ala Arg 
u 4775 478O 

Val Thr lie Ala Ala Ser Asp Val Ala Asp Arg Asn Ala Leu Glu Ala 
85 4790 4795 4800 

Val Leu Gin Ala lie Pro Ala Glu Trp Pro Leu Gin Gly Val He His 
4805 4810 4815 

Ala Ala Gly Ala Leu Asp Asp Gly Val Leu Asp Glu Gin Thr Thr Asp 
4820 4825 4830 

Arg Phe Ser Arg Val Leu Ala Pro Lys Val Thr Gly Ala Trp Asn Leu 
4835 4840 4845 

HiB JSn"- tBU Thr Ala Gly Asn Asp Leu Ala Phe Phe Val Leu Phe Ser 
4 " u 4855 486O 

f!L Met Ser Gly Leu Leu Gly Ser Ala Gly Gin Ser Asn Tyr Ala Ala 
4865 4870 4875 4 880 

Ala Asn Thr Phe Leu Asp Ala Leu Ala Ala His Arg Arg Ala Glu Gly 
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4885 4890 4895 

Leu Ala Ala Gin Ser Leu Ala Trp Gly Pro Trp. Ser Asp Gly Gly Met 
4900 4905 ... 4910 

Ala Ala Gly Leu Ser Ala Ala Leu Gin Ala Arg Leu Ala Arg His Gly 
4915 4920 4925 

Met Gly Ala Leu Ser Pro Ala Gin Gly Thr Ala Leu Leu Gly Gin Ala 
4930 4935 4940 

Leu Ala Arg Pro Glu Thr Gin Leu Gly Ala Met Ser Leu Asp Val Arg 
4945 4950 . 4955 4960 

Ala Ala Ser Gin Ala Ser Gly Ala Ala Val Pro Pro Val Trp. Arg Ala 
4965 .4970 4975 

Leu Val Arg Ala Glu Ala Arg His Thr Ala Ala Gly Ala Gin Gly Ala 
4980 . . 4985 4990 

Leu Ala Ala Arg, Leu Gly Ala Leu Pro Glu Ala Arg Arg Ala Asp Glu 
4995 5000 5005 . 

Val Arg Lys Val Val Gin Ala Glu He Ala Arg Val Leu Ser . Trp Ser 
5010 5015 5020 

Ala. Ala Ser Ala Val Pro Val Asp Arg Pro Leu Ser Asp Leu Gly Leu 
5025 5030 5035 5040 

Asp Ser Leu Thr Ala Val Glu Leu Arg Asn Val Leu Gly Gin Arg Val 
5045 5050 5055. 

Gly Ala Thr Leu Pro Ala Thr Leu Ala Phe Asp His Pro Thr Val Asp 
5060 5065 5070 

Ala Leu Thr Arg Trp Leu Leu Asp Lys Val Leu Ala Val Ala Glu Pro 
5075 5080 5085 

Ser Val Ser Ser Ala Lys Ser Ser Pro Gin Val Ala Leu Asp Glu Pro 
. 5090 5095. 5100 

He Ala He lie Gly He Gly Cys Arg Phe. Pro bly Gly Val Ala Asp 
5105 5110 5115 5120 

Pro Glu Ser Phe Trp Arg Leu Leu Glu Glu Gly Ser Asp Ala Val Val 
5125 5130 5135 

Glu Val Pro His Glu Arg Trp Asp lie Asp Ala Phe Tyr Asp Pro Asp 
5140 5145 5150 

Pro Asp Val Arg Gly Lys Met. Thr Thr Arg Phe Gly Gly Phe Leu Ser 
5155 5160 5165 

Asp He Asp Arg Phe Asp Pro Ala Phe Phe Gly He Ser Pro Arg Glu 
5170 ~ 5175 5180 

Ala Thr Thr Met Asp Pro Gin Gin Arg Leu Leu Leu Glu Thr Ser Trp 
5185 5190 5195 5200 . 

Glu Ala Phe Glu Arg Ala Gly lie Leu Pro Glu Arg Leu Met Gly Ser 
.5205 5210 5215 

Asp Thr Gly Val Phe Val Gly Leu Phe Tyr Gin Glu Tyr Ala Ala Leu 
5220 5225 5230 
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Ala Gly Gly He Glu Ala Phe Asp Gly Tyr Leu Gly Thr Gly Thr Thr 
5235 5240 5245 

Ala Ser Val . Ala . Ser Gly Arg He Ser Tyr Val Leu Gly Leu Lys Gly 
5250 5255 5260 

Pro Ser Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala Val 
5265 5270 5275 5280 

His Leu Ala Cys Gin Ala Leu Arg Arg Gly Glu Cys Ser Val Ala Leu 
5285 5290 5295 

Ala Gly Gly Val Ala Leu Met Leu Thr Pro Ala Thr Phe Val Glu Phe 
5300 5305 5310 

Ser Arg Leu Arg Gly Leu Ala Pro Asp Gly Arg Cys Lys Ser Phe Ser 
5315 5320 5325 

Ala Ala Ala Asp Gly Val Gly Trp Ser Glu Gly Cys Ala Met Leu Leu 
5330 5335 5340 

Leu Lys Pro Leu Arg Asp Ala Gin Arg Asp Gly Asp Pro lie Leu Ala 
5345 5350 5355 5360 

Val He Arg Gly Thr Ala Val Asn Gin Asp Gly Arg Ser Asn Gly Leu 
5365 5370 5375 

Thr Ala Pro Asn Gly Ser Ser Gin Gin Glu Val He Arg Arg Ala Leu 
5380 5385 5390 

Glu Gin Ala Gly Leu Ala Pro Ala Asp Val Ser Tyr Val Glu Cys His 
5395 5400 . 5405 

Gly Thr Gly Thr Thr Leu Gly Asp Pro He Glu Val Gin Ala Leu Gly 
5410 5415 5420 

Ala Val Leu Ala Gin Gly Arg Pro Ser Asp Arg Pro Leu Val He Gly 
5425 5430 5435 5440 

Ser Val Lys Ser Asn lie Gly His Thr Gin Ala Ala Ala Gly Val Ala 
,5445 5450 5455 

Gly Val lie Lys Val Ala Leu Ala Leu Glu Arg Gly Leu He Pro Arg 
5460 5465 5470 

Ser Leu His Phe Asp Ala Pro Asn Pro His lie Pro Trp Ser Glu Leu 
5475 5480 5485 

Ala Val Gin Val Ala Ala Lys Pro Val Glu Trp Thr Arg Asn Gly Val 
5490 5495 5500 

Pro Arg Arg Ala Gly Val Ser Ser Phe Gly Val Ser Gly Thr Asn Ala 
5505 5510 5515 5520 

His Val Val Leu Glu Glu Ala Pro Ala Ala Ala Phe Ala Pro Ala Ala 
5525 5530 5535 

Ala Arg Ser Ala Glu Leu Phe Val Leu Ser Ala Lys Ser Ala Ala Ala 
5540 5545 5550 

Leu Asp Ala Gin Ala Ala Arg Leu Ser Ala His Val Val Ala His Pro 
5555 5560 5565 

Glu Leu Gly Leu Gly Asp Leu Ala Phe Ser Leu Ala Thr Thr Aro Ser 
5570 5575 5580 
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Pro Met Thr Tyr Arg Leu Ala Val Ala Ala Thr Ser Arg Glu Ala Leu 
5585 ' 5590 5595 5600 

Ser Ala Ala Leu Asp Thr Ala Ala Gin Gly Gin Ala Pro Pro Ala Ala 
5605 5610 5615 

Ala Arg Gly His Ala Ser Thr Gly Ser Ala Pro Lys Val Val Phe Val 
, 5620 5625 5630 

Phe Pro Gly Gin Gly Ser Gin Trp Leu Gly Met Gly Gin Lys Leu Leu 
5635 5640 5645 

Ser Glu Glu Pro Val Phe Arg Asp Ala Leu Ser Ala Cys Asp Arg Ala 
5650 5655 .5660 

He Gin Ala Glu Ala Gly Trp Ser Leu Leu Ala Glu Leu Ala Ala Asp 
5665 5670 5675 5680 

Glu Thr Thr Ser Glh Leu Gly Arg He Asp Val Val Gin Pro Ala Leu 
5685 5690 5695 

Phe Ala He Glu Val Ala Leu Ser Ala Leu Trp Arg Ser Trp Gly Val 
5700 5705 5710 

Glu Pro Asp Ala Val Val Gly His Ser Met Gly Glu Val Ala Ala Ala 
5715 5720 ■■ . 5725 

His Val Ala Gly Ala Leu Ser Leu Glu Asp Ala Val Ala He He Cys 
5730 ■ 5735 5740 

Arg Arg Ser Leu Leu Leu Arg Arg He Ser Gly Gin Gly Glu Met Ala 
5745 5750 5755 5760 

Val Val Glu Leu Ser Leu Ala Glu Ala Glu Ala Ala Leu Leu Gly Tyr 
5765 5770 5775 

Glu Asp Arg Leu Ser Val Ala Val Ser Ash -Ser Pro Arg Ser Thr Val 
5780 5785 5790 

Leu Ala Gly Glu Pro Ala Ala Leu Ala Glu Val Leu Ala He Leu Ala 
5795 5800 5805 

Ala Lys Gly Val Phe Cys. Arg Arg: Val Lys Val Asp Val Ala Ser His 
5810 5815 5820 

Ser Pro Gin lie Asp Pro Leu Arg Asp Glu Leu Leu Ala Ala Leu Gly 
5825 5830 5835 5840 

Glu Leu Glu Pro Arg Gin Ala Thr Val Ser Met Arg Ser Thr Val Thr 
5845 5850 5855 

Ser Thr He Met Ala Gly Pro Glu Leu Val Ala Ser Tyr Trp Ala Asp 
5860 5865 5870 

Asn Val Arg Gin Pro Val Arg Phe Ala Glu Ala Val Gin Ser Leu Met 
5875 5880 5885 

Glu Asp Gly His Gly Leu Phe Val Glu Met Ser Pro His Pro He Leu 
5890 5895 5900 . 

Thr Thr Ser Val Glu Glu He Arg Arg Ala Thr Lys Arg Glu Gly Val 
5905 5910 5915 5920 

Ala Val Gly Ser Leu Arg Arg Gly Gin Asp Glu Arg Leu Ser Met Leu 
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5925 5930 5935 

Glu Ala Leu Gly Ala Leu Trp Val His Gly Gin Ala Val Gly Trp Glu 
5940 5945 ( 5950 

Arg Leu Phe Ser Ala Gly Gly Ala Gly Leu Arg Arg Val Pro Leu Pro 
5955 5960 5965 

Thr Tyr Pro Trp Gin Arg Glu Arg Tyr Trp Val Asp Ala Pro Thr Gly 
5970 5975 5980 

Gly Ala Ala Gly Gly Ser Arg Phe Ala His Ala Gly Ser His Pro Leu 
5985 5990 5995 6000 

Leu Gly Glu Met Gin Thr Leu Ser Thr Gin Arg Ser Thr Arg Val Trp 
6005 6010 6015 

Glu Thr Thr Leu Asp Leu Lys Arg Leu Pro Trp Leu Gly Asp His Arg 
6020 6025 6030 

Val Gin Gly Ala Val Val Phe Pro Gly Ala Ala Tyr Leu Glu Met Ala 
6035 6040 6045 

Leu Ser Ser Gly Ala Glu Ala Leu Gly Asp Gly Pro Leu Gin Val Ser 
6050 6055 6060 

Asp Val Val Leu Ala Glu Ala Leu Ala Phe Ala Asp Asp Thr Pro Ala 
6065 6070 6075 6080 

Ala Val Gin Val Met Ala Thr Glu Glu Arg Pro Gly Arg Leu Gin Phe 
6085 6090 6095 

His Val Ala Ser Arg Val Pro Gly His Gly Gly Ala Ala Phe Arg Ser 
6100 6105 6110 

His Ala Arg Gly Val Leu Arg Gin He Glu Arg Ala Glu Val Pro Ala 
6115 6120 6125 

Arg Leu Asp Leu Ala Ala Leu Arg Ala Arg Leu Gin Ala Ser Ala Pro 
6130 6135 6140 

Ala Ala Ala Thr Tyr Ala Ala Leu Ala Glu Met Gly Leu Glu Tyr Gly 
6145 6150 6155 6160 

Pro Ala Phe Gin Gly Leu Val Glu Leu Trp Arg Gly Glu Gly Glu Ala 
6165 6170 6175 

Leu Gly Arg Val Arg Leu Pro Glu Ala Ala Gly Ser Pro Ala Ala Cys 
6180 6185 6190 

Arg Leu His Pro Ala Leu Leu Asp Ala Cys Phe His Val Ser Ser Ala 
6195 6200 6205 

Phe Ala Asp Arg Gly Glu Ala Thr Pro Trp Val Pro Val Glu lie Gly 
6210 6215 6220 

Ser Leu Arg Trp Phe Gin Arg Pro Ser Gly Glu Leu Trp Cys His Ala 
6225 6230 6235 6240 

Arg Ser Val Ser His Gly Lys Pro Thr Pro Asp Arg Arg Ser Thr Asp 
6245 6250 6255 

Phe Trp Val Val Asp Ser Thr Gly Ala lie Val Ala Glu He Ser Gly 
6260 6265 6270 
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Leu Val Ala Gin Arg Leu Ala Gly Gly Val Arg Arg Arg Glu Glu Asp 
6275 6280 6285 

Asp Trp Phe Met Glu Pro Ala Trp Glu Pro Thr Ala Val Pro Gly Ser 
629Q 6295 6300 

Glu Val Met Ala Gly Arg Trp Leu Leu lie Gly Ser Gly Gly Gly Leu 
6305 6310 6315 6320 

Gly Ala Ala Leu His Ser Ala Leii Thr Glu Ala Gly His Ser Val Val 
6325 6330 6335 

His Ala Thr Gly Arg Gly Thr Ser Ala Ala Gly Leu Gin Ala Leu Leu 
6340 6345 6350 

Thr Ala Ser Phe Asp Gly Gin Ala Pro Thr Ser Val Val His Leu Gly 
6355 6360 6365 

Ser Leu Asp Glu Arg Gly Val Leu Asp Ala Asp Ala Pro Phe Aso Ala 
6370 6375 • ■ 6380 

Asp Ala Leu Glu Glu Ser Leu Val Arg Gly Cys Asp Ser Val Leu Trp 
6385 6390 6395 6400 

Thr Val Gin Ala Val Ala Gly Ala Gly Phe Arg Asp Pro Pro Arg Leu 
6405 6410 6415 

Trp Leu Val Thr Arg Gly Ala Gin Ala He Gly Ala Gly Asp Val Ser 
6420 6425 6430 

Val Ala Gin Ala Pro Leu Leu Gly Leu Gly Arg Val He Ala Leu Glu 
6435 6440 6445 

His Ala Glu Leu Arg Cys Ala Arg He Asp Leu Asp Pro Ala Arg Arg 
6450 6455 6460 

Asp Gly Glu Val Asp Glu. Leu Leu Ala Glui Leu Leu Ala Asp Asp Ala 
6465 6470 6475 6480 

Glu Glu Glu Val Ala Phe Arg Gly Gly Glu Arg Arg Val Ala Arg Leu 
6485 6490 6495 

Val Arg Arg Leu Pro Glu Thr Asp Cys Arg Glu Lys He Glu Pro Ala 
6500 6505 6510 

Glu Gly Arg Pro Phe Arg Leu Glu He Asp Gly Ser Gly Val Leu Asp 
6515 6520 6525 

Asp Leu Val Leu Arg Ala Thr Glu Arg Arg Pro Pro Gly Pro Gly Glu 
6530 6535 6540 

Val Glu He Ala Val Glu Ala Ala Gly Leu Asn Phe Leu Asp Val Met 
6545 6550 6555 6560 

Arg Ala Met Gly He Tyr Pro Gly Pro Gly Asp Gly Pro Val Ala Leu 
6565 6570 6575 

Gly Ala Glu Cys Ser Gly Arg lie Val Ala Met Gly Glu Gly Val Glu 
6580 6585 . 6590 

Ser Leu Arg He Gly Gin Asp Val Val Ala Val Ala Pro Phe Ser Phe 
6595 6600 6605 

Gly Thr His Val Thr lie Asp Ala Arg Met Leu Ala Pro Arg Pro Ala 
6610 6615 6620 
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Ala Leu Thr Ala Ala Gin Ala Ala Ala Leu Pro Val Ala Phe Met Thr 
6625 6630 6635 6640 

Ala Trp Tyr Gly Leu Val His Leu Gly Arg Leu Arg Ala Gly Glu Arg 
6645 6650 6655 

Val Leu lie His Ser Ala Thr Gly Gly Thr Gly Leu Ala Ala Val Gin 
6660 6665 6670 

lie Ala Arg His Leu Gly Ala Glu He Phe Ala Thr Ala Gly Thr Pro 
6675 6680 6685 

Glu Lys Arg Ala Trp Leu Arg Glu Gin Gly lie Ala His Val Met Asp 
6690 6695 6700 

Ser Arg Ser Leu Asp Phe Ala Glu Gin Val Leu Ala Ala Thr Lys Gly 
6705 6710 6715 6720 

Glu Gly Val Asp Val Val Leu Asn Ser Leu Ser Gly Ala Ala He Asp 
6725 6730 6735 

Ala Ser Leu Ser Thr Leu Val Pro Asp Gly Arg Phe lie Glu Leu Gly 
6740 6745 6750 

Lys Thr Asp lie Tyr Ala Asp Arg Ser Leu Gly Leu Ala His Phe Arg 
6755 6760 6765 

Lys Ser Leu Ser Tyr Ser Ala Val Asp Leu Ala Gly Leu Ala Val Arg 
6770 6775 6780 

Arg Pro Glu Arg Val Ala Ala Leu Leu Ala Glu Val Val Asp Leu Leu 
6785 6790 6795 6800 

Ala Arg Gly Ala Leu Gin Pro Leu Pro Val Glu He Phe Pro Leu Ser 
6805 6810 6815 

Arg Ala Ala Asp Ala Phe Arg Lys Met Ala Gin Ala Gin Kis Leu Gly 
6820 ' 6825 6830 

Lys Leu Val Leu Ala Leu Glu Asp Pro Asp Val Arg He Arg Val Pro 
6835 6840 6845 

Gly Glu Ser Gly Val Ala lie Arg Ala Asp Gly Ala Tyr Leu Val Thr 
6850 6855 6860 

Gly Gly Leu Gly Gly Leu Gly Leu Ser Val Ala Gly Trp Leu Ala Glu 
6865 6870 6875 6880 

Gin Gly Ala Gly His Leu Val Leu Val Gly Arg Ser Gly Ala Val Ser 
6885 6890 6895 

Ala Glu Gin Gin Thr Ala Val Ala Ala Leu Glu Ala His Gly Ala Arg 
6900 6905 6910 

Val Thr Val Ala Arg Ala Asp Val Ala Asp Arg Ala Gin Met Glu Arg 
6915 6920 6925 

He Leu Arg Glu Val Thr Ala Ser Gly Met Pro Leu Arg Gly Val Val 
6930 6935 6940 

His Ala Ala Gly lie Leu Asp Asp Gly Leu Leu Met Gin Gin Thr Pro 
6945 6950 6955 6960 

Ala Arg Phe . Arg Ala Val Met Ala Pro Lys Val Arg Gly Ala Leu His 
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6965 6970 6975 

Leu His Ala Leu Thr Arg Glu Ala Pro Leu Ser Phe Phe Val Leu Tyr 
6980 6985 6990 

Ala Ser Gly Ala Gly Leu Leu Gly Ser Pro Gly Gin Gly Asn Tyr Ala 
6995 7000 7005 

Ala Ala Asn Thr Phe Leu Asp Ala Leu Ala His His Arg Arg. Ala Gin 
7010 7015 7020 

Gly Leu Pro Ala Leu Ser lie Asp Trp Gly Leu Phe Ala Asp Val Gly 
7025 7030 7035 7040 

Leu Ala Ala Gly Gin Gin Asn Arg Gly Ala Arg Leu Val Thr Arg Gly 
7045 7050 7055 , 

Thr Arg Ser Leu Thr Pro Asp Glu Gly Leu Trp Ala Leu Glu Arg Leu 
7060 7065 7070 

Leu Asp Gly Asp Arg Thr Gin Ala Gly Val Met Pro Phe Asp Val Arg 
7075 7080 7085 

Gin Trp Val Glu Phe Tyr Pro Ala Ala Ala Ser Ser Arg Arg Leu Ser 
7090 7095 7100 

Arg Leu Met Thr Ala Arg Arg Val Ala Ser Gly Arg Leu Ala Gly Asp 
7105 7110 7115 7120 

Arg Aso Leu Leu Glu Arg Leu Ala Thr Ala Glu Ala Gly Ala Arg Ala 
7125 7130 7135- 

Gly Met Leu Gin Glu Val Val Arg Ala Gin Val Ser Gin Val Leu Arg 
7140 7145 7150 

Leu Ser Glu Gly Lys Leu Asp Val Asp Ala Pro Leu Thr Ser Leu Gly 
7155 7160 7165 

Met Asp Ser Leu Met Gly Leu Glu Leu Arg Asn Arg He Glu Ala Val 
7170 7175 7180 

Leu Gly He Thr Met Pro Ala Thr Leu Leu Trp Thr Tyr Pro Thr Val 
7185 7190 7195 7200 

Ala Ala Leu Ser Ala His Leu Ala Ser His Val Val Ser Thr Gly Asp 
7205 7210 7215 

Gly Glu Ser Ala Arg Pro Pro Asp Thr Gly Ser Val Ala Pro Thr. Thr 
7220 7225 7230 

His Glu Val Ala Ser Leu Asp Glu Asp Gly Leu Phe Ala Leu He Asp 
7235 " 7240 7245 

Glu Ser Leu Ala Arg Ala Gly Lys Arg 
7250 7255 



<210> 6 

<211> 3798 ..... 
<212> PRT 

<213> Sorangium cellulosum 
<400> 6 

Val Thr Asp Arg Glu Gly Gin Leu Leu Glu Arg Leu Arg Glu Val Thr 
1 5 10 15 



WO 99/66028 PCI7EP99/04171 



-54- 



Leu Ala Leu Arg Lys Thr Leu Asn Glu Arg Asp Thr' Leu GIu Leu Glu 
20 25 30 

Lys Thr Glu Pro He Ala He Val Gly lie Gly Cys Arg Phe Pro Gly 
35 40 45 

Gly Ala Gly Thr Pro Glu Ala Phe Trp Glu Leu Leu Asp Asp Gly Arg 
50 55 60 

Asp Ala lie Arg Pro Leu Glu Glu Arg Trp Ala Leu Val Gly Val Asp 
65 70 75 80 

Pro Gly Asp Asp Val Pro Arg Trp Ala Gly Leu Leu Thr Glu Ala He 
85 90 95 

Asp Gly Phe Asp Ala Ala Phe Phe Gly He Ala Pro Arg Glu Ala Arg 
.100 105 110 

Ser Leu Asp Pro Gin His Arg Leu Leu Leu Glu Val Ala Trp Glu Gly 
115 120 125 

Phe Glu Asp Ala Gly He Pro Pro Arg Ser Leu Val Gly Ser Arg Thr 
130 : 135 140 

Gly Val Phe Val Gly Val Cys Ala Thr Glu Tyr Leu His Ala Ala Val 
145 150 155 160 

.Ala His Gin Pro Arg Glu Glu Arg Asp Ala Tyr Ser Thr Thr Gly Asn 
165 170 175 

Met Leu Ser He Ala Ala Gly Arg Leu Ser Tyr Thr Leu Gly Leu Gin 
180 185 190 

Gly Pro Cys Leu Thr Val Asp Thr Ala Cys Ser Ser Ser Leu Val Ala 
195 200 205 

lie His Leu Ala Cys Arg Ser Leu Arg Ala Arg Glu Ser Asp Leu Ala 
210 215 220 

Leu Ala Gly Gly Val Asn Met Leu Leu Ser Pro Asp Thr Met Arg Ala 
225 230 235 240 

Leu Ala Arg Thr Gin Ala Leu Ser Pro Asn Gly Arg Cys Gin Thr Phe 
245 250 255 

Asp Ala Ser Ala Asn Gly Phe Val Arg Gly Glu Gly Cys Gly Leu lie 
260 265 270 

Val Leu Lys Arg Leu Ser Asp Ala Arg Arg Asp Gly Asp Arg He Trp 
275 280 285 

Ala Leu He Arg Gly Ser Ala He Asn Gin Asp Gly Arg Ser Thr Gly 
290 295 300 

Leu Thr Ala Pro Asn Val Leu Ala Gin Gly Ala Leu Leu Arg Glu Ala 
305 310 315 320 

Leu Arg Asn Ala Gly Val Glu Ala Glu Ala lie Gly Tyr He Glu Thr 
325 330 335 

His Gly Ala Ala Thr Ser Leu Gly Asp Pro lie Glu lie Glu Ala Leu 
340 345 350 

Arg Ala Val Val Gly Pro Ala Arg Ala Asp Gly Ala Arg Cys Val Leu 
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355 360 

Gly Ala Val Lys Thr Asn Leu Gly His Leu 
370 375 

Ala Gly Leu lie Lys . Ala Thr Leu Ser Leu 
• 385 390 . 

Arg Asn Leu Asn Phe Arg Thr Leu Asn Pro 
405 410 

Thr Ala Leu Ala Leu Ala Thr Glu : Pro Val 
420 425 

Arg Thr Arg Phe Ala Gly Val Ser Ser Phe 
.435 . 440 

Ala His Val Val Leu Glu Glu Ala Pro Ala 
450 455 

Ala Pro Glu Arg Ala Ala Glu. Leu Phe Val 
465 470 

Ala Ala Leu Asp Ala Gin Ala Ala Arg Leu 
. 485 . 490 

His Val Glu Leu Gly Leu Gly Asp Val Ala 
500 505 

Arg Ser Ala Met Glu His Arg Leu. Ala Val 
515 520 

Ala Leu Arg Gly Ala Leu Ser Ala Ala Ala Gin Gly His Thr Pro Pro 
530 535 . 540 

Gly Ala Val Arg Gly Arg Ala Ser Gly Gly Ser Ala Pro Lys Val Val 
545 550 555 560 

Phe Val Phe Pro Gly Gin Gly Ser Gin Trp Val Gly Met Gly Arg Lys 
565 570 575 

Leu Met Ala Glu Glu Pro Val Phe Arg Ala Ala Leu Glu Gly Cys Asp 
580 -.585 .590 

Arg Ala lie Glu Ala Glu Ala Gly Trp Ser Leu Leu Gly Glu Leu Ser 
595 600 605 : 

Ala Asp Glu Ala Ala Ser Gin Leu Gly Arg He Asp Val Val Gin Pro 
. 610 615 620 

Val Leu Phe Ala Met Glu Val Ala Leu Ser Ala Leu Trp Arg Ser Trp 
625 630 635 640 

Gly Val Glu Pro Glu Ala Val Val Gly His Ser Met Gly Glu Val Ala 
645 650 655 

Ala Ala His Val Ala Gly Ala Leu Ser Leu . Glu Asp Ala Val Ala lie 
660 665 670 

He Cys Arg Arg Ser Arg Leu Leu. Arg Arg He Ser Gly Gin Gly Glu 
675 .680 685 

Met Ala Leu Val Glu Leu Ser. Leu Glu Glu Ala Glu Ala Ala Leu Arg 
690 695 .700 



365 

Glu Gly Ala Ala Gly Val 
380 

His His Glu Arg He Pro 
395 400 

Arg lie Arg He Glu Gly 
415 

Pro Trp Pro Arg. Thr Gly 
430 

Gly Met Ser Gly Thr Asn 
445 

Val Glu Pro Glu Ala Ala 
460 

Leu Ser Ala Lys Ser Ala 
475 480 

Arg Asp His Leu Glu Lys 
495 

Phe Ser Leu Ala Thr Thr 
- 510 

Ala Ala Ser Ser Arg Glu 
525 
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Gly His Glu Gly Arg Leu Ser Val Ala Val Ser Asn Ser Pro Arg Ser 
705 710 715 • 720 

Thr Val Leu Ala Gly Glu Pro Ala Ala Leu Ser Glu Val Leu Ala Ala 
725 730 735 

Leu Thr Ala Lys Gly Val Phe Trp Arg Gin Val Lys Val Asp Val Ala 
740 745 750 

Ser His Ser Pro Gin Val Asp Pro Leu Arg Glu Glu Leu lie Ala Ala 
755 760 765 

Leu Gly Ala He Arg Pro Arg Ala Ala Ala Val Pro Met Arg Ser Thr 
770 775 780 

Val Thr Gly Gly Val lie Ala Gly Pro Glu Leu Gly Ala Ser Tyr Tro 
785 790 795 800 

Ala Asp Asn Leu Arg Gin Pro Val Arg Phe Ala Ala Ala Ala Gin Ala 
.805 810 815 

Leu Leu Glu Gly Gly Pro Ala Leu Phe He Glu Met Ser Pro His Pro 
820 825 830 

He Leu Val Pro Pro Leu Asp Glu He Gin Thr Ala Ala Glu Gin Glv 
835 840 ! 845 

Gly Ala Ala Val Gly Ser Leu Arg Arg Gly Gin Asp Glu Arg Ala Thr 
850 855 860 

Leu Leu Glu Ala Leu Gly Thr Leu Trp Ala Ser Gly Tyr Pro Val Ser 
865 870 875 880 

Trp Ala Arg Leu Phe Pro Ala Gly Gly Arg Arg Val Pro Leu Pro Thr 
885 890 895 

Tyr Pro Trp Gin His Glu Arg Cys Trp lie Glu Val Glu Pro Asp Ala 
900 905 910 

Arg Arc Leu Ala Ala Ala Asp Pro Thr Lys Asp Tro Phe Tyr Arg T*~ 
915 920 * 925 

Asp Trp Pro Glu Val Pro Arg Ala Ala Pro Lys Ser Glu Thr Ala His 
930 935 940 

Gly Ser Trp Leu Leu Leu Ala Asp Arg Gly Gly Val Gly Glu Ala Val 
945 950 955 960 

Ala Ala Ala Leu Ser Thr Arg Gly Leu Ser Cys Thr Val Leu His Ala 
965 970 975 

Ser Ala Asp Ala Ser Thr Val Ala Glu Gin Val Ser Glu Ala Ala Ser 
980 985 990 

Arg Arg Asn Asp Trp Gin Gly Val Leu Tyr Leu Trp Gly Leu Asp Ala 
995 1000 1005 

Val Val Asp Ala Gly Ala Ser Ala Asp Glu Val Ser Glu Ala Thr Arg 
1010 1015 1020 

Arg Ala Thr Ala Pro Val Leu Gly Leu Val Arg Phe Leu Ser Ala Ala 
1025 1030 1035 1040 

Pro His Pro Pro Arg Phe Trp Val Val Thr Arg Gly Ala Cys Thr Val 
1045 1050 1055 
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Gly Gly Glu Pro Glu Ala Ser Leu Cys Gin Ala Ala Leu Trp Gly Leu . 
1060 1065 . 1070 

Ala Arg Val Ala Ala Leu Glu His Pro Ala Ala Trp Gly Gly Leu Val 
1075 1080 1085 

Asp Leu Asp Pro Gin Lys Ser. Pro Thr Glu lie Glu Pro Leu Val Ala 
1090 ' 1095 1100 

Glu Leu Leu Ser Pro Asp Ala Glu Asp Gin Leu Ala Phe Arg Ser Gly 
1105 1110 1115 1120 

Arg Arg His Ala Ala Arg Leu Val Ala Ala Pro Pro Glu Gly Asp Val 
1125 1130 1135 - 

Ala Pro lie Ser Leu Ser Ala Glu Gly Ser Tyr Leu Val Thr Gly Gly 
1140 . 1145 1150 

Leu Gly Gly Leu Gly Leu Leu Val Ala Arg Trp Leu Val, Glu Arg Gly 
1155 ' 1160 .. 1165 

Ala Arg His Leu Val Leu Thr Ser Arg His Gly Leu Pro Glu Arg Gin 
1170 1175 1180 

Ala Ser Gly Gly Glu Gin Pro Pro Glu Ala Arg Ala Arg lie Ala Ala 
1185 ■ .1190 1195 1200 

Val Glu Gly Leu Glu Ala Gin Gly Ala Arg Val Thr Val Ala Ala Val 
1205 . 1210 1215 

Asp Val Ala Glu Ala Asp Pro Met Thr Ala Leu Leu Ala Ala lie Glu 
1220 1225 1230 

Pro Pro Leu Arg Gly Val Val His Ala Ala Gly Val Phe Pro Val Arg 
1235 1240 1245 

His Leu Ala Glu Thr Asp Glu Ala Leu Leu Glu Ser Val Leu Arg Pro 
1250 1255.. 1260 

Lys Val Ala Gly Ser Trp Leu Leu His Arg Leu Leu Arg Asp Arg Pro 
1265 1270 1275 1280 

Leu Asp Leu Phe Val Leu Phe Ser Ser Gly Ala Ala Val Trp Gly Gly 
1285 1290 1295 

Lys Gly Gin Gly Ala Tyr Ala Ala Ala Asn Ala Phe Leu Asp Gly Leu 
1300 1305 1310 

Ala His His Arg Arg Ala His. Ser Leu Pro Ala Leu Ser Leu Ala Trp 
1315 1320 1325 

Gly Leu Trp Ala Glu Gly Gly Met Val Asp Ala Lys Ala His Ala Arg 
1330 1335 1340 

Leu Ser Asp He Gly Val Leu Pro Met Ala Thr Gly Pro Ala Leu Ser 
1345 1350 1355 1360 

Ala Leu Glu Arg Leu Val Asn Thr Ser Ala Val Gin Arg Ser Val Thr 
1365 1370 ; 1375 

Arg Met Asp Trp Ala Arg Phe Ala Pro Val Tyr Ala Ala Arg Gly Arg 
1380 1385 1390 

Arg Asn Leu Leu Ser . Ala Leu Val Ala Glu Asp Glu Arg Ala Ala Ser 
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1395 1400 1405 

Pr °,ffS Vdl Pr ° Thr Ma Asn **g lie Trp Arg Gly Leu Ser Val Ala 
1410 1415 1420 

Glu Ser Arg Ser Ala Leu Tyr Glu Leu Val Arg Gly lie Val Ala Arg 
1425 1430 1435 1440 

Val Leu Gly Phe Ser Asp Pro Gly Ala Leu Asp Val Gly Arg Gly Phe 
1445 1450 1455 

Ala Glu Gin Gly Leu Asp Ser Leu Met Ala Leu Glu He Arg Asn Arq 
1460 1465 1470 

Leu Gin Arg Glu Leu Gly Glu Arg Leu Ser Ala Thr Leu Ala Phe Asp 
1475 1480 1485 

His Pro Thr Val Glu Arg Leu Val Ala His Leu Leu Thr Asp Val Leu 
1490 1495 1500 

^L LeU Glu Asp **g Ser Thr tog His lie Arg Ser Val Ala Ala 
1505 1510 1515 1520 

Asp Asp Asp lie Ala He Val Gly Ala Ala Cys Arg Phe Pro Gly Glv 
1525 1530 1535 

Asp Glu Gly Leu Glu Thr Tyr Trp Arg His Leu Ala Glu Gly Met Val 
1540 1545 1550 

Val Ser Thr Glu Val Pro Ala Asp Arg Trp Arg Ala Ala Asp Trp Tyr 
1555 1560 1565 

Asp Pro Asp Pro Glu Val Pro Gly Arg Thr Tyr Val Ala Lys Gly Ala 
I 570 1575 1580 

Phe Leu Arg Asp Val Arg Ser Leu Asp Ala Ala Phe Phe Ala He Ser 
1585 1590 1595 1600 

Pro Arg Glu Ala Met Ser Leu Asp Pro Gin Gin Arg Leu Leu Leu Glu 
1605 1610 1615 

Val Ser Trp Glu Ala He Glu Arg Ala Gly Gin Asp Pro Met Ala Leu 
1620 1625 1630 

Arg Glu Ser Ala Thr Gly Val Phe Val Gly Met lie Gly Ser Glu His 
1635 1640 1645 

Ala Glu Arg Val Gin Gly Leu Asp Asp Asp Ala Ala Leu Leu Tyr Glv 
1650 1655 1660 

Thr Thr Gly Asn Leu Leu Ser Val Ala Ala Gly Arg Leu Ser Phe Phe 
1665 . 1670 1675 1680 

Leu Gly Leu His Gly Pro Thr Met Thr Val Asp Thr Ala Cys Ser Ser 
1685 1690 1695 

Ser Leu Val Ala Leu His Leu Ala Cys Gin Ser Leu Arg Leu Gly Glu 
1700 1705 1710 

Cys Asp Gin Ala Leu Ala Gly Gly Ser Ser Val Leii Leu Ser Pro Arg 
1715 1720 1725 

Ser Phe Val Ala Ala Ser Arg Met Arg Leu Leu Ser Pro Asp Gly Ara 
1730 1735 1740 
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Cys Lys Thr Phe Ser Ala Aia Ala Asp Gly Phe Ala Arg Ala Glu Gly 
.. 1745 1750 . 1755 1760 

Cys Ala Val Val Val Leu Lys Arg Leu Arg Asp Ala Gin Arg Asp Arg 
1765 1770 1775 

Asp Pro He Leu Ala Val Val Arg Ser Thr Ala He Asn His Asp Gly 
1780 1785 1790 

Pro Ser Ser Gly Leu Thr Val Pro Ser Gly Pro Ala Gin Gin Ala Leu 
1795 ■ 1800^ 1805 

LeU Arg Gin Ala Leu Ala Gin Ala Gly Val Ala Pro Ala Glu Val Asp 
1810 1815 1820 

Phe Val Glu Cys His Gly Thr Gly Thr Ala Leu Gly Asp Pro He Glu 
1825 1830 1835 1840 

Val Gin Ala Leu Gly Ala Val Tyr Gly Arg Gly Arg Pro Ala Glu Arg 
; . 1845 1850 1855 

Pro Leu Trp Leu. Gly Ala Val Lys Ala Asn Leu Gly His Leu Glu Ala 
I860 1865 1870 

Ala Ala Gly Leu Ala Gly Val Leu Lys Val Leu Leu Ala Leu Glu His 
1875, . .1880 1885 

Glu Gin He Pro Ala Gin Pro Glu Leu Asp Glu Leu Asn Pro His He 
1890 1895 1900 

Pro Trp Ala Glu Leu Pro Val Ala Val Val Arg Arg Ala Val Pro Trp 
1905 1910 . 1915 1920 

Pro Arg Gly . Ala Arg Pro Arg Arg Ala Gly Val Ser Ala Phe Gly . Leu 
1925 ■ 1930 1935 

Ser Gly Thr Asn Ala His Val Val Leu Glu Glu Ala Pro Ala Val Glu 
1940 1945 1950 

Pro Val Ala Ala Ala Pro Glu Arg Ala Ala Glu Leu Phe Val Leu Ser 
1955 . • 1960 1965 

Ala Lys Ser Ala Ala Ala Leu Asp Ala Gin Ala Ala Arg Leu Arg Asp 
1970 . 1975 1980 

His Leu Glu Lys His Val Glu Leu Gly Leu Gly Asp Val Ala Phe Ser 
1985 1990 1995 2000 

Leu Ala Thr Thr Arg Ser Ala Met Glu His Arg Leu Ala Val Ala Ala 
2005 2010 2015 

Ser Ser Arg Glu Ala Leu Arg Gly Ala Leu Ser Ala Ala Ala Gin Gly 
2020 2025 2030 

His Thr Pro Pro Gly Ala Val Arg Gly Arg Ala Ser Gly Gly Ser Ala 
2035 2040 2045 

Pro Lys Val Val Phe Val Phe Pro Gly Gin Gly Ser Gin Trp Val Gly 
2050 2055 2060 

Met Gly Arg Lys Leu Met Ala Glu Glu Pro Val Phe Arg Ala Ala Leu 
2065 * 2070 2075 2080 

Glu Gly Cys Asp Arg Ala He Glu Ala Glu. Ala Gly Trp Ser Leu. Leu 
2085 2090 2095 
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Gly Glu Leu Ser Ala Asp Glu Ala Ala Ser Gin Leu Gly Arg lie Asp 
2100 2105 2110 

Val VarGln Pro Val Leu Phe Ala Met Glu Val Ala Leu Ser Ala Leu 
2115 2120 2125 

TrP n^2 Ser Trp Gly Val Glu Pro Glu Ala Val Val Gly His Ser Met 
2130 2135 2140 

Gly Glu Val Ala Ala Ala His Val Ala Gly Ala Leu Ser Leu Glu Asp 
2145 . 2150 2155 2160 

Ala Val Ala lie He Cys Arg Arg Ser Arg Leu Leu Arg Arg lie Ser 
2165 2170 2175 

Gly Gin Gly Glu Met Ala Leu Val Glu Leu Ser Leu Glu Glu Ala Glu 
2180 2185 2190 

Ala Ala Leu Arg Gly His Glu Gly Arg Leu Ser Val Ala Val Ser Asn 
2195 2200 2205 

Ser *f ° ^ Ser Thr Va * Leu Ala Gly Glu Pro Ala Ala Leu Ser Glu 
2210 2215 2220 

YoL Leu Ala Aia Leu Thr Ala Lys Gly Val Phe Trp Arg Gin Val Lys 
2225 2230 2235 2240 

. Val Asp Val Ala Ser His Ser Pro Gin Val Asp Pro Leu Arg Glu Glu 
2245 2250 2255 

Leu He Ala Ala Leu Gly Ala lie Arg Pro Arg Ala Ala Ala Val Pro 
2260 2265 2270 

Met Arg Ser Thr Val Thr Gly Gly Val lie Ala Gly Pro Glu Leu Glv 
2275 2280 2285 

Ala ^™ ^ Tr P Ala Asp Asn Leu Arg Gin Pro Val Arg Phe Ala Ala 
2290 2295 2300 

£in* Ala Gln Ala Leu Leu Glu Gl y Gl Y Pro Ala Leu Phe He Glu Met 
2305 2310 2315 2320 

Ser Pro His Pro He Leu Val Pro Pro Leu Asp Glu He Gin Thr Ala 
2325 2330 2335 

Ala Glu Gin Gly Gly Ala Ala Val Gly Ser Leu Arg Arg Gly Gin Asp 
2340 2345 2350 

Glu Arg Ala Thr Leu Leu Glu Ala Leu Gly Thr Leu Trp Ala Ser Gly 
2355 2360 2365 

^^tn 0 Val Ser Trp Ala Arg Leu Phe Pro Ala Gly Gly Arg Arg Val 
2370 2375 • 2380 

nf«e Leu ** q Thr ^ Pro Trp Gin His Glu Arg Tyr Trp He Glu Asp 
2385 2390 2395 2400 

Ser Val His Gly Ser Lys Pro Ser Leu Arg Leu Arg Gin Leu Arg Asn 
2405 2410 2415 

Gly Ala Thr Asp His Pro Leu Leu Gly Ala Pro Leu Leu Val Ser Ala 
2420 2425 2430 

Arg Pro Gly Ala His Leu Trp Glu Gin Ala Leu Ser Asp Glu Arg Leu 
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2435 . 2440 2445 

Ser Tyr Leu Ser Glu His Arg Val His Gly Glu Ala Val Leu Pro Ser 
2450 2455 2460 

Ala Ala Tyr Val Glu Met Ala Leu Ala Ala Gly Vai Asp Leu Tyr Gly 
2465 2470 2475 ' 2480 

Thr Ala Thr Leu Val Leu Glu Gin Leu Ala Leu Glu Arg Ala Leu Ala 
2485 2490 2495 

Val Pro Ser Glu Gly Gly Arg lie Val Gin Val Ala Leu Ser Glu Glu 
2500 2505 2510 

Gly Pro Gly Arg Ala Ser Phe Gin Val Ser Ser Arg Glu Glu Ala Gly 
2515 2520 2525 

Arg Ser Trp Val Arg His Ala Thr Gly His Val Cys Ser Gly Gin Ser 
2530 2535 2540 

Ser Ala Val Gly Ala Leu Lys Glu Ala Pro Trp Glu lie Gin Arg Arg 
2545 ' 2550 2555 .2560 

Cys Pro Ser Val Leu Ser Ser Glu Ala Leu Tyr Pro Leu Leu Asn Glu 
2565 2570 2575 

His Ala Leu Asp Tyr Gly Pro Cys Phe Gin Gly Val Glu Gin Val Trp 
2580 2585 2590 

Leu Gly Thr Gly Glu Val Leu Gly Arg Val Arg Leu Pro Gly Asp Met 
2595 2600 2605, 

Ala Ser Ser Ser Gly Ala Tyr Arg lie His Pro Ala Leu Leu Asp Ala 
2610 2615 2620 

Cys Phe Gin Val Leu Thr Ala Leu Leu Thr Thr Pro Glu Ser lie Glu 
2625 . 2630 2635 2640 

lie Arg Arg Arg Leu Thr Asp Leu His Glu Pro Asp Leu Pro Arg Ser 
. : 2645 2650 2655 

Arg Ala Pro Val Asn Gin Ala Val Ser Asp Thr Trp Leu Trp Asp Ala 
2660 2665 2670 

Ala Leu Asp Gly Gly Arg Arg Gin Ser Ala Ser Val Pro Val Asp Leu 
2675 2680 . 2685 

Val Leu Gly Ser Phe His Ala Lys Trp Glu Val Met Glu Arg Leu Ala 
2690 2695 • 2700 

Gin Ala Tyr He He Gly Thr Leu Arg He Trp Asn Val Phe Cys Ala 
2705 2710 2715 2720 

Ala Gly Glu Arg His Thr lie Asp Glu Leu Leu Val Arg Leu Gin He 
2725 2730 . ,2735 

Ser Val Val Tyr Arg Lys Val He Lys Arg Trp Met Glu His Leu Val 
2740 2745 2750 

Ala lie Gly lie Leu Val Gly Asp Gly Glu His Phe Val Ser Ser Gin 
2755 .2760 2765 

Pro Leu Pro Glu Pro Asp Leu Ala Ala Val Leu Glu Glu Ala Gly Arg 
2770 2775 2780 
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- 9?L Phe Ma Val Leu Phe Glu Tr P Cys Lys Phe Ala Gly 

2785 2790 2795 2800 

Glu Arg Leu Ala Asp Val Leu Thr Gly Lys Thr Leu Ala Leu Glu He 
28 °5 2810 2815 

Leu Phe Pro Gly Gly Ser Phe Asp Met Ala Glu Arg He Tyr Arg Asp 
2820 2825 2830 

Ser Pr °o^f Ma Arg ^ Ser >Asrt G1 V Ile v al Arg Gly Val Val Glu 
2835 2840 2845 

Ser o^n Ala Ar9 Val Val Ala Pro Ser Met Phe Ser He Leu Glu 
2850 2855 2860 

lie Gly Ala Gly Thr Gly Ala Thr Thr Ala Ala Val Leu Pro Val. Leu 
2855 • 2870 2875 2880 

Leu Pro Asp Arg Thr Glu Tyr His Phe Thr Asp Val Ser Pro Leu Phe 
2885 2890 2895 

Leu Ala Arg^ Ala Glu Gin Arg Phe Arg Asp Tyr Pro Phe Leu Lys Tyr 
2900 2905 2910 

Gly He Leu Asp Val Asp Gin Glu Pro Ala Gly Gin Gly Tyr Ala His 
2yib 2920 2925 

GXl \£n ?he Val IIe VaI Ala Ala Asn v al Ile His Ala Thr Arg 
2930 2935 2940 

Asp He Arg Ala Thr Ala Lys Arg Leu Leu Ser Leu Leu Ala Pro Gly 
2945 2950 2955 2960 

Gly Leu Leu Val Leu Val Glu Gly Thr Gly His Pro Ile Trp Phe Asp 
2965 2970 2975 

Ile Thr Thr Gly Leu He Glu Gly Trp Gin Lys Tyr Glu Asp Asp Leu 
2980 2985 2990 

Arg Iie Kis pro Le u Leu Pro Ala Arg Thr Trp Cys Asp Val Leu 
z995 3000 3005 

^.JfJ Val Gl y phe Ala Asp Ala Val Ser Leu Pro Gly Asp Gly Ser 
J01 ° 3015 3020 

Pro Ala Gly He Leu Gly Gin His Val He Leu Ser Arg Ala Pro Gly 
J025 3030 3035 3040 

Ile Ala Gly Ala Ala Cys Asp Ser Ser Gly Glu Ser Ala Thr Glu Ser 
3045 3050 3055 

Pro Ala Ala Arg Ala Val Arg Gin Glu Trp Ala Asp Gly Ser Ala Asp 
3060 3065 3070 

Val Val His Arg Met Ala Leu Glu Arg Met Tyr Phe His Arg Arg Pro 
3075 3080 3085 

Gly 3090 ^ V£ll 3095 ^ g LeU Arg Thr Gl y G1 V Gl V Ala 

Phe Thr Lys Ala Leu Ala Gly Asp Leu Leu Leu Phe Glu Asp Thr Gly 
3105 3110 3H5 3120 

Gin Val Val Ala Glu Val Gin Gly Leu Arg Leu Pro Gin Leu Glu Ala 
3125 3130 3135 
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Ser Ala Phe Ala Pro Arg Asp Pro Arg Glu Glu Trp Leu Tyr Ala Leu 
3140 3145 3150 

Glu Trp Gin Arg Lys Asp Pro lie Pro Glu Ala Pro Ala Ala Ala Ser 
3155 3160 3165 

Ser Ser Ser Ala Gly Ala Trp Leu Val Leu Met Asp Gin Gly Gly Thr 
3170 3175 3180 

Gly Ala Ala Leu Val Ser Leu Leu Glu Gly Arg Gly Glu Ala Cys Val 
3185 • . 3190 . 3195 3200 

Arg Val lie Ala Gly Thr Ala Tyr Ala Cys Leu Ala Pro Gly Leu Tyr 
3205 3210 3215 

Gin Val Asp Pro Ala Gin Pro Asp Gly Phe His Thr Leu Leu Arg Asp 
3220 3225 3230 

Ala Phe Gly Glu Asp Arg lie Cys Arg Ala Val Val His Met Trp Ser 
3235 3240 3245 

Leu Asp Ala Thr Ala Ala Gly Glu Arg Ala Thr Ala Glu Ser Leu Gin 
3250 3255 3260 

Ala Asp Gin Leu Leu Gly Ser Leu Ser Ala Leu Ser Leu Val Gin Ala 
3265 3270 3275 3280 

Leu Val Arg Arg Arg Trp Arg Asn Met Pro Arg Leu Trp Leu Leu Thr 
3285 .3290 3295 

Arg Ala Val His Ala Val Gly Ala Glu Asp Ala Ala >1 a Ser Val, Ala 
3300 3305 . 3310 

Gin Ala Pro Val Trp Gly Leu Gly Arg Thr Leu Ala Leu Glu His Pro 
3315 3320 3325 

Glu Leu Arg Cys Thr Leu Val Asp Val Asn Pro Ala Pro Ser Pro Glu 
3330 . 3335 3340 

Asp Ala Ala Ala Leu Ala Val Glu Leu Gly Ala Ser Asp Arg Glu Asp 
3345 • 3350 3355 3360 

Gin Val Ala Leu Arg Ser Asp Gly Arg Tyr Val Ala Arg Leu Val Arg 
3365 3370 3375 

Ser Ser Phe Ser Gly Lys Pro Ala Thr Asp Cys . Gly lie Arg Ala Asp 
3380 3385 3390 

Gly Ser Tyr Val lie Thr Asp Gly Met Gly Arg Val Gly Leu Ser Val 
3395 3400 3405 

Ala Gin Trp Met Val Met Gin Gly Ala Arg His Val Val Leu Val Asp 
3410 3415 3420 

Arg Gly Gly Ala Ser Glu Ala Ser Arg Asp Ala Leu Arg Ser Met Ala 
3425 3430 3435 3440 

Glu Ala Gly Ala Glu Val Gin lie Val Glu Ala Asp Val Ala Arg Arg 
3445 3450 3455 

Asp Asp Val Ala Arg Leu Leu Ser Lys He Glu Pro Ser Met Pro Pro 
3460 3465 3470 

Leu . Arg Gly He Val Tyr Val Asp Gly Thr Phe Gin Gly Asp Ser Ser 
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3475 3480 3485 

Met 3490 Ala 3 Jff ■* rB PhS LyS G1U 5^ MSt ,Tyr Pr ° Lys 

Val Leu Gly Ala Tip Asn Leu His Ala Leu Thr Arg Asp Arg Ser Leu 
3505 3510 3515 3 5 20 

Asp Phe Phe Val Leu Tyr Ser Ser Gly Thr ser Leu Leu Gly Leu Pro 
3525 3530 3535 

Gly Gin Gly Ser Arg Ala Ala Gly Asp Ala Phe Leu Asp Ala He Ala 
3540 3545 3550 

His His^Arg Cys Lys Val Gly Leu Thr Ala Met Ser He Asn Trp Gly 
Ji55 3560 3565 

Leu Leu Ser Glu Ala Ser Ser Pro Ala Thr Pro Asn Asp Gly Gly Ala 
J3/0 3575 358O 

ArgXeu Glu Tyr. Arg Gly Met Glu Gly Leu Thr Leu Glu Gin Gly Ala 
3585 ■ 3590 3595 3600 

Ala Ala Leu G V Arg Leu Leu Ala Arg Pro Arg Ala Gin Val Gly Val 
3605 3610 3615 

Met Arg Leu Asn Leu Arg Gin Trp Leu Glu Phe Tyr Pro Asn Ala Ala 
3620 3625 3630 

Arg Leu Ala Leu Trp Ala Glu Leu Leu Lys Glu Arg Asp Arg Ai a Asp 
JbJ = 3640 3645 

Ala Ser Asn ^ s er Asn Leu Arg Glu Ala Leu Gin Ser Ala 
■V 030 3655 3660 

Arg Pro Glu Asp Arg Gin Leu He Leu Glu Lys His Leu Ser Glu Leu 
3665 3670 3675 3680 

Leu Gly Arg Gly^Leu Arg Leu Pro Pro Glu Arg He Glu Arg His Val 
3685 3690 3695 

Pro Phe Ser Asn Leu Gly Met Asp Ser Leu He Gly Leu Glu Leu Arg 
3700 3705 3710 

Asn Arg lie Glu Ala Ala Leu Gly He Thr Val Pro Ala Thr Leu Leu 
3'15 3720 3725 

Trp Thr Tyr Pro Asn Val Ala Ala Leu Ser Gly Ser Leu Leu Asp He 
J/J0 3735 3740 

Leu Phe Pro Asn Ala Gly Ala Thr His Ala Pro Ala Thr Glu Arg Glu 
3745 3750 3755 376O 

Lys Ser Phe Glu Asn Asp Ala Ala Asp Leu Glu Ala Leu Arg Gly Met 
3765 3770 3775 

Thr Asp Glu Gin Lys Asp Ala Leu Leu Ala Glu Lys Leu Ala Gin Leu 
378 <> 3785 3790 

Ala Gin lie Val Gly Glu 
3795 



<210> 7 
<211> 2439 
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<212> PRT . 

<213> Sorangium cellulosum 
<400> 7 

Met Ala Thr Thr Asn Ala Gly Lys Leu Glu His Ala Leu Leu Leu Met 
1 5 10 15 

Asp Lys Leu Ala Lys Lys Asn Ala Ser Leu Glu Gin Glu Arg Thr Glu 
20 25 30 

Pro He Ala lie Val Gly lie Gly Cys Arg Phe Pro Gly Gly Ala Asp 
35 40 . 45 

Thr Pro Glu Ala Phe Trp Glu Leu Leu Asp Ser Gly Arg Asp Ala Val 
50 . 55 . 60 

Gin Pro Leu Asp Arg Arg Trp Ala Leu Val Gly Val His Pro Ser qlu 
65 70 75 80 

Glu Val Pro Arg Trp Ala Gly Leu Leu Thr Glu Ala Val Asp Gly Phe 
• 85 90 95 

Asp Ala Ala Phe Phe Gly Thr Ser Pro Arg. Glu Ala Arg Ser Leu Asp 
100— 105 110 

Pro Gin . Gin Arg Leu Leii Leu Glu Val Thr Trp Glu Gly Leu Glu Asp 
115 : . 120 125 

Ala Gly He Ala Pro Gin Ser Leu Asp Gly Ser Arg Thr Gly Val Phe 
130 135 140 

Leu Gly Ala Cys Ser Ser Asp Tyr Ser His Thr Val Ala Gin Gin Arg 
145 . 150 155 160 

Arg Glu Glu Gin Asp Ala Tyr Asp He Thr Gly Asn Thr Leu Ser Val 
165. 170 175 

Ala Ala Gly Arg Leu Ser Tyr Thr Leu Gly Leu Gin Gly Pro Cys Leu 
180 185 190 

Thr Val Aso Thr Ala Cys Ser Ser Ser Leu Val Ala lie His Leu Ala 
195 200 205 

Cys Arc Ser Leu Arg Ala Arg Glu Ser Asp. Leu Ala Leu Ala Gly Gly 
210 215 220 

Val Asn Met Leu Leu Ser Ser Lys Thr Met He Met Leu Gly Arg lie 
225 230 235 240 

Gin Ala Leu Ser Pro Asp Gly His Cys Arg Thr Phe Asp Ala Ser Ala 
245 250 255 

Asn Gly Phe Val Arg Gly Glu Gly Cys Gly Met Val Val Leu Lys Arg 
260 265 270 

Leu Ser Asp Ala Gin Arg His Gly Asp Arg He Trp Ala Leu He Arg 
275 280 285 

Gly Ser Ala Met Asn Gin Asp Gly Arg Ser Thr Gly Leu Met Ala Pro 
290 295 300 

Asn Val Leu Ala Gin Giu Ala Leu Leu Arg Glu Ala Leu Gin Ser Ala 
305 . 310 315 320 

. Arg Val Asp Ala Gly Ala He Gly Tyr Val Glu Thr His Gly Thr Gly 
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325 



330 



335 



Thr Ser Leu Gly Asp Pro He Glu Val Glu Ala Leu Arg Ala Val Leu 
340 345 350 

Gly Pro Ala Arg Ala Asp Gly Ser Arg Cys Val Leu Gly Ala Val Lys 
355 360 365 

370 — Gly Ala GlY Val Ma Gly Leu Ile 



375 



380 



Lys Ala Ala Leu Ala Leu His His Glu Leu lie Pro Arg Asn Leu His 
385 390 395 400 

Phe His Thr Leu Asn Pro Arg lie Arg Ile Glu Gly Thr Ala Leu Ala 
4 °5 410 415 

Leu Ala Thr Glu Pro Val Pro Trp Pro Arg Ala Gly Arg Pro Arg Phe 
420 4 25 430 

Ala Gly val Ser Ala Phe Gly Leu Ser Gly Thr Asn Val His Val Val 
435 440 445 

Leu Glu Glu Ala Pro Ala Thr Val Leu Ala Pro Ala Thr Pro Gly Arg 

450 ^cc~ . — 



455 



460 



Ser Ala Glu Leu Leu Val Leu Ser Ala Lys Ser Ala Ala Ala Leu Asp 



470 



475 



480 



Ala Gin Ala Ala Arg Leu Ser Ala His lie Ala Ala Tyr Pro Glu Gin 
48 * 490 495 

Gly Leu Gly Asp Val Ala Phe Ser Leu Val Ser Thr Arg Ser Pro Met 
500 505 510 

Glu His Arg Leu Ala Val Ala Ala Thr Ser Arg Glu Ala Leu Arg Ser 
515 520 525 

Ala Leu Glu Val Ala Ala Gin Gly Gin Thr Pro Ala Gly Ala Ala Arg 



535 



540 



Gly Arg Ala Ala Ser Ser Pro Gly Lys Leu Ala Phe Leu Phe Ala Gly 

ccn J 



550 



555 



560 



Gin Gly Ala Gin Val Pro Gly Met Gly Arg Gly Leu Trp Glu Ala Trp 
565 570 575 

Pro Ala Phe Arg Glu Thr Phe Asp Arg Cys Val Thr Leu Phe Asp Arg 
580 585 590 

Glu Leu His Gin Pro Leu Cys Glu Val Met Trp Ala Glu Pro Gly Ser 
595 600 605 

Ser Arg ser Ser Leu Leu Asp Gin Thr Ala Phe Thr Gin Pro Ala Leu 
510 615 620 

Phe Ala Leu Glu Tyr Ala Leu Ala Ala Leu Phe Arg Ser Trp Gly Val 



630 



635 



640 



Glu Pro Glu Leu Val Ala Gly His Ser Leu Gly Glu Leu Val Ala Ala 
645 650 \ 655 

Cys Val Ala Gly Val Phe Ser Leu Glu Asp Ala Val Arg Leu Val Val 
660 665 670 
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Ala Arg Gly ArCf Leu Met Gin Ala Leu Pro Ala Gly Gly Ala Met Val 
675 680 685 

Ser He Ala Ala Pro Glu Ala Asp Val Ala Ala Ala Val Ala Pro His 
690 695 700 

Ala . Ala Leu Val Ser He Ala Ala Val Asn Gly Pro Glu Gin Val Val . 
705 . 710 715 720 

He Ala Gly Ala Glu Lys Phe Val Gin Gin He Ala Ala Ala Phe Ala 
725 . 73Q 735 

Ala Arg Gly Ala Arg Thr Lys Pro Leu His Val Ser His Ala Phe His 
740 745 750 

Ser Pro Leu Met Asp Pro Met Leu Glu Ala Phe Arg Arg Val Thr. Glu 
755 760 765 

Ser Val Thr Tyr Arg Arg Pro Ser He Ala Leu Val Ser Asn Leu Ser 
770 775 780 

Gly Lys Pro Cys Thr Asp Glu Val Ser Ala Pro Gly Tyr Trp Val Arg 
785 790 .795 800 . 

His Ala. Arg Glu Ala Val Arg Phe Ala Asp Gly Val. Lys Ala Leu His 
805 810 815 

Ala Ala Gly Ala Gly Leu Phe Val Glu Val Gly Pro Lys Pro Thr Leu 
820 825 830 

Leu Gly Leu Val Pro Ala Cys Leu Pro Asp Ala Arg Pro Val Leu Leu 
835 840 845 

Pro Ala Ser Arg Ala Gly Arg Asp Glu Ala Ala Ser Ala Leu Glu Ala 
850 : 855 860 

Leu Gly Gly Phe Trp Val Val Gly Gly Ser Val Thr Trp Ser Gly Val 
865 870 875 880 

Phe Pro Ser Gly Gly Arg Arg Val Pro Leu Pro Thr Tyr Pro Trp Gin 
885 .890 895 : 

Arg Glu Arg Tyr Trp. He Glu Ala Pro Val Asp Arg Glu Ala Asp Gly 
900 .905 910 

Thr Gly Arg Ala Arg Ala Gly Gly His Pro Leu Leu Gly Glu Val Phe 
915 920 925 

Ser Val Ser Thr His Ala Gly Leu Arg Leu Trp Glu Thr Thr Leu Asp 
.930 935 940 

Arg Lys Arg Leu Pro Trp Leu Gly Glu His Arg Ala Gin Gly Glu Val 
945 950 955 960 

Val Phe Pro Gly Ala Gly Tyr Leu Glu Met Ala Leu Ser Ser Gly Ala 
965 970 975 

Glu lie Leu Gly Asp Gly Pro lie Gin Val Thr Asp Val Val Leu lie, 
980 985 990 

Glu Thr Leu Thr Phe Ala Gly Asp Thr Ala Val Pro Val Gin Val Val 
995 1000 • 1005 

Thr Thr Glu Glu Arg Pro Gly Arg Leu Arg Phe Gin Val Ala Ser Arg 
1010 1015 1020 
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Glu Pro Gly Glu Arg Arg Ala Pro Phe Arg lie His Ala Arg Gly Val 
1025 1030 1035 9 1040 

Leu Arg Arg He Gly Arg Val Glu Thr Pro Ala Arg Ser Asn Leu Ala 
1045 1050 1055 

Ala Leu Arg Ala Arg Leu His Ala Ala Val Pro Ala Ala Ala He Tyr 
1060 1065 1070 

Gly Ala Leu Ala Glu Met Gly Leu Gin Tyr Gly Pro Ala Leu Arg Gly 
1075 1080 1085 

Leu ,££ Glu Leu Trp Gly Glu Gly Glu Ala Leu Gly Arg Val Arg 
1090 1095 1100 

Leu Pro Glu Ala Ala Gly Ser Ala Thr Ala Tyr Gin Leu His Pro Val 
1105 mo ins U20 

Leu Leu Asp Ala Cys Val Gin Met lie Val Gly Ala Phe Ala Asp Arg 
H25 U30 1135 

Asp Glu Ala Thr Pro Trp Ala Pro Val Glu Val Gly Ser Val Arg Leu 
1140 1145 H50 

?he Gin Arg ser Pro Gly Glu Leu Trp Cys His Ala Arg Val Val Ser 
■ l155 1160 1165 

ASp .?^ Gln Gln Ala Ser Ser Arg Tr P Ser Ala Asp Phe Glu Leu Met 
111 0 1175 H80 

Asp c Gly Thr Gly Ala Val Val Ala Glu He Ser Arg Leu Val Val Glu 
-- 85 1150 H95 1200 

Arg Leu Ala Ser Gly Val Arg Arg Arg Asp Ala Asp Asp Trp Phe Leu 
1205 1210 1215 

Glu Leu Asp Trp Glu Pro Ala Ala Leu Gly Gly Pro Lys lie Thr Ala 
1220 1225 1230 

Gly ArgTrp Leu Leu Leu Gly Giu Gly Gly Gly Leu Gly Aro Ser Leu 
*- 35 1240 1245 

^' s ,!5; AIa Leu L V S Aia Ala Gly His Val Val Val His Ala Ala Gly 
12 -° 1255 1260 

Asp Asp Thr Ser Thr Ala Gly Met Arg Ala Leu Leu Ala Asn Ala Phe 
1265 1270 1275 1280 

Asp Gly Gin Ala Pro Thr Ala Val Val His Leu Ser Ser Leu Asp Gly 
1285 1290 1295 

Gly Gly Gin Leu Gly Pro Gly Leu Gly Ala Gin Gly Ala Leu Asp Ala 
1300 1305 1310 

Pro Arg Ser Pro Asp Val Asp Ala Asp Ala Leu Glu Ser Ala Leu Met 
1315 1320 1325 

Arg Gly Cys Asp Ser Val Leu Ser Leu Val Gin Ala Leu Val Gly Met 
1330 1335 1340 

AsprLeu Arg Asn Ala Pro Arg Leu Trp Leu Leu Thr Arg Gly Ala Gin 
1345 1350 1355 1360 

Ala Ala Ala Ala Gly Asp Val Ser Val Val Gin Ala Pro Leu Leu Gly 
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. 1365 1370 1375 

Leu Gly Arg Thr He Ala Leu. Glu His Ala Glu Leu Arg cys He Ser 
1380 1385. 1390 

Val Asp Leu Asp Pro Ala Glu Pro Glu Gly Glu Ala Asp Ala Leu Leu 
1395 1400 1405 

Ala Glu Leu Leu Ala Asp Asp Ala Glu Glu Glu Val Ala Leu Arg Gly 
1410 1415 1420 

Gly Asp Arg Leu Val Ala Arg Leu Val His Arg Leu Pro Asp Ala Gin 
1425 ; 1430 1435 1440 

Arq Arg Glu Lys Val Glu Pro Ala Gly Asp Arg Pro Phe Arg Leu Glu 
1445 1450 . . 1455 

He Asp Glu Pro Gly Ala Leu Asp Gin Leu Val Leu Arg Ala Thr Gly 
1460 1465 1470 

Arg Arg Ala Pro' Gly Pro Gly Glu Val Glu He Ser Val Glu Ala Ala 
1475 1480 1485 

Gly Leu Asp Ser He Asp He Gin Leu Ala Leu Gly Val Ala Pro Asn 
1490. .1495 1500 

Asp Leu Pro Gly Glu Glu He Glu Pro Leu Val Leu Gly Ser Glu Cys 
1505 1510 ■ 1515. . 1520 

Ala. Gly Arg He Val Ala Val Gly Glu Gly Val Asn Gly Leu Val Val 
1525 1530 1535 

Giy Gin Pro Val He Ala Leu Ala Ala Gly Val Phe Ala Thr His Val 
1540 - 1545 1550 

Thr Thr Ser Ala Thr Leu Val Leu Pro Arg Pro Leu Gly Leu Ser Ala 
. 1555 : 1560 1565 

Thr Glu Ala Ala Ala Met Pro Leu Ala Tyr Leu Thr Ala Trp Tyr Ala 
1570 ■ 1575 1580 

Leu Asp Lys Val Ala His Leu Gin Ala Gly Glu Arg Val Leu He His 
1585 1590 1595 . 1600 

Ala Glu Ala Gly Gly Val Gly Leu Cys Ala Val Arg Trp Ala Gin Arg 
1605 1610 1615 

Val Gly Ala Glu Val Tyr Ala Thr Ala Asp Thr Pro Glu Asn Arg Ala 
1620 1625 1630 

Tyr Leu Glu Ser Leu Gly Val Arg Tyr Val Ser Asp Ser Arg Ser Gly 
. 1635 1640 1645 

Arg Phe Val Thr Asp Val His Ala. Trp Thr Asp Gly Glu Gly Val Asp 
1650 1655 1660 

Val Val Leu Asp Ser Leu Ser Gly Glu Arg lie Asp Lys Ser Leu Met 
1665 ' 1670 ■■ 1675 1680 

Val Leu Arg Ala Cys Gly Arg Leu Val Lys Leu Gly Arg Arg Asp Asp 
.1685 .1690 1695 

Cys Ala Asp Thr Gin Pro Gly Leu Pro Pro Leu Leu Arg Asn Phe Ser 
1700 1705 1710 
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Phe Ser Gin Val Asp Leu Arg Gly Met Met Leu Asp Gin Pro Ala Arg 
.1715 1720 .1725 

lie Arg Ala Leu Leu Asp Glu Leu Phe Gly Leu Val Ala Ala Gly Ala 
1730 1735 ... 1740 

lie Ser Pro Leu Gly Ser Gly Leu Arg Val Gly Gly Ser Leu Thr Pro 
1745 1750 1755 1760 

Pro Pro Val Glu Thr Phe Pro He Ser Arg Ala Ala Glu Ala Phe Arg 
1765 1770 1775 

Arg Met Ala Gin Gly Gin His Leu Gly Lys Leu Val Leu Thr Leu Asp 
1780 1785 1790 

Asp Pro Glu Val Arg lie Arg Ala Pro Ala Glu Ser Ser Val Ala Val 
1795 1800 1805 

Arg Ala Asp Gly Thr Tyr Leu Val Thr Gly Gly Leu Gly Gly Leu Gly 
1810 1815 1820 

Leu Arg Val Ala Gly Trp Leu Ala Glu Arg Gly Ala Gly Gin Leu Val 
1825 1830 1835 1840 

Leu Val Gly Arg Ser Gly Ala Ala Ser Ala Glu Gin Arg Ala Ala Val 
1845 1850 1855 

Ala Ala Leu Glu Ala His Gly Ala Arg Val Thr Val Ala Lys Ala Asp 
I860 i865 1870 

Val Ala Asp Arg Ser Gin lie Glu Arg Val Leu Arg Glu Val Thr Ala 
1875 1880 1885 

Ser Gly Met Pro Leu Arg Gly Val Val His Ala Ala Gly Leu Val Asp 
1890 1895 1900 

Asp Gly Leu Leu Met Gin Gin Thr Pro Ala Arg Phe Arg Thr Val Met 
1905 1910 1915 1920 

Gly Pro Lys Val Glh Gly Ala Leu His Leu His Thr Leu Thr Arg Glu 
: !925 1930 1935 

Ala Pro Leu Ser Phe Phe Val Leu Tyr Ala Ser Ala Ala Gly Leu Phe 
1940 1945 1950 

Gly Ser Pro Gly Gin Gly Asn Tyr Ala Ala Ala Asn Ala Phe Leu Asp 
1955 i960 1965 

Ala Leu Ser His His Arg Arg Ala Gin Gly Leu Pro Ala Leu Ser lie 
1970 1975 1980 

Asp Trp Gly Met Phe Thr Glu Val Gly Met Ala Val Ala Gin Glu Asn 
1985 1990 1995 2000 

Arg Gly Ala Arg Gin lie Ser Arg Gly Met Arg Gly He Thr Pro Asp 
2005 2010 2015 

Glu Gly Leu Ser Ala Leu Ala Arg Leu Leu Glu Gly Asp Arg Val Gin 
2020 2025 2030 

Thr Gly Val lie Pro He Thr Pro Arg Gin Trp Val Glu Phe Tyr Pro 
2035 2040 2045 

Ala Thr Ala Ala Ser Arg Arg Leu Ser Arg Leu Val Thr Thr Gin Arg 
2050 2055 2060 



WO 99/66028 



PCT/EP99/04171 



-71 - 



Ala Val Ala Asp Arg Thr Ala Gly Asp Arg Asp Leu Leu Glu Gin Leu 
2065 2070 2075 2080 

Ala Ser Ala Glu Pro .Ser Ala Arg Ala Gly Leu Leu Gin Asp Val Val 
2085 2090 2095 

Arg Val Gin Val Ser His Val Leu Arg Leu Pro Glu Asp Lys lie Glu 
2100 2105 2110 

Val Asp Ala Pro Leu Ser Ser Met Gly Met Asp Ser Leu Met Ser Leu 
2115 2120 . 2125 

Glu Leu Arg Asn Arg lie Glu Ala Ala Leu Gly Val Ala Ala Pro Ala 
2130 2135 . 2140 

Ala Leu Gly Trp Thr Tyr Pro . Thr Val Ala Ala lie Thr Arg Trp Leu 
.2145 2150 2155 2160 

Leu Asp Asp Ala Leu Val Val Arg Leu Gly Gly Gly Ser Asp Thr Asp 
2165 - 2170 2175 

Glu Ser Thr Ala Ser Ala Gly Ser Phe Val His Val Leu Arg Phe Arg 
2180 2185 2190 

Pro Val Val Lys Pro Arg Ala Arg Leu Phe Cys Phe His Gly Ser Gly 
2195 . 2200 2205 

Gly Ser Pro Glu Gly Phe Arg Ser Trp Ser Glu Lys Ser Glu Trp Ser 
2210 2215 2220 

Asp Leu Glu lie Val Ala Met Trp His Asp Arg Ser Leu Ala Ser Glu 
2225 . 2230 2235 2240 

Asp Ala Pro Gly Lys Lys Tyr Val Gin Glu Ala Ala Ser Leu He Gin 
2245 2250 2255 

His Tyr Ala Asp Ala Pro Phe Ala Leu . Val Gly Phe Ser Leu Gly Val 
2260 2265 2270 

Arg Phe Vai Met: Gly Thr Ala Val Glu Leu Ala Ser Arg Ser Gly Ala 
2275 2280 2285 

Pro Ala Pro Leu Ala Val Phe Thr Leu Gly Gly Ser Leu He Ser. Ser 
2290 2295 2300 

Ser Glu lie Thr Pro Glu Mec Glu Thr Asp lie He Ala Lys Leu Phe 
2305 2310 2315 2320 

Phe Arg Asn Ala Ala Gly Phe Val Arg Ser Thr Gin Gin Val Gin Ala 
2325 2330 2335 

Asp Ala Arg Ala Asp Lys Val He Thr Asp Thr Met Val Ala Pro Ala 
2340 2345 2350 

Pro Gly Asp Ser Lys Glu Pro Pro Val Lys He Ala Val Pro lie Val 
2355 2360 2365 

Ala He Ala Gly Ser Asp Asp Val lie Val Pro Pro Ser Asp Val Gin 
2370 2375 2380 

Asp Leu. Gin Ser Arg Thr Thr Glu Arg Phe Tyr Met His Leu Leu Pro 
2385 2390 2395 2400 

Gly Asp His Glu Phe Leu Val Asp Arg Gly Arg Glu He Met His lie 
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2405 2410 2415 

Val Asp Ser His Leu Asn Pro Leu Leu Ala Ala Arg Thr Thr ser Ser 
2420 2425 2430 

Gly Pro Ala Phe Glu Ala Lys 
2435 



<210> 8 - 
<211> 419 
<212> PRT 

<213> Sorangium celiulosum 
<400> 8 

Met Thr Gin Glu Gin Ala Asn Gin Ser Glu Thr Lys Pro Ala Phe Asp 
1 5 10 15 

Phe Lys Pro Phe Ala Pro Gly Tyr Ala Glu Asp Pro Phe Pro Ala lie 
20 25 ~ 30 

Glu Arg Leu Arg Glu Ala Thr Pro lie Phe Tyr Trp Asp Glu Gly Arg 
. 35 40 45 

Ser Trp Val Leu Thr Arg Tyr His Asp Val Ser Ala Val Phe Arg Asp 
50 55 60 

Glu Arg Phe Ala Val Ser Arg Glu Glu Trp Glu Ser Ser Ala Glu Tyr 
6d 70 75 80 

Ser Ser Ala lie Pro Glu Leu Ser Asp Met Lys Lys Tyr Gly Leu Phe 
85 90 95 

Gly Leu Pro Pro Glu Asp His Ala Arg Val Arg Lys Leu Val Asn Pro 
100 105 no 

Ser Phe Thr Ser Arg Ala lie Asp Leu Leu Arg Ala Glu lie Gin Arg 
115 120 125 

Thr Val Asp Gin Leu Leu Asp Ala Arg Ser Gly Gin Glu Glu Phe Asp 
130 135 140 

Val Val Arg Asp Tyr Ala Glu Gly lie Pro Met Arg Ala lie Ser Ala 
145 150 155 160 

Leu Leu Lys Val Pro Ala Glu Cys Asp Glu Lys Phe Arg Arg Phe Gly 
165 170 175 

Ser Ala Thr Ala Arg Ala Leu Gly Val Gly Leu Val Pro Gin Val Asp 
180 185 190 

Glu Glu Thr Lys Thr Leu Val Ala Ser Val Thr Glu Gly Leu Ala Leu 
195 200 205 

Leu His Asp Val Leu Asp Glu Arg Arg Arg Asn Pro Leu Glu Asn Asp 
210 215 220 

Val Leu Thr Met Leu Leu Gin Ala Glu Ala Asp Gly Ser Arg Leu Ser 
225 230 235 240 

Thr Lys Glu Leu Val Ala Leu Val Gly Ala lie lie Ala Ala Gly Thr 
245 250 255 

Asp Thr Thr He Tyr Leu lie Ala Phe Ala Val Leu Asn Leu Leu Arg 
260 265 270 
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Ser Pro Glu Ala Leu Glu Leu. Val Lys Ala Glu Pro Gly Leu Met Arg 
275 280 . 285 

Asn Ala Leu Asp Glu Val Leu Arg Phe Asp Asn He Leu Arg lie Gly 
290 . 295 ;•' 300 

Thr Val Arg Phe Ala Arg Gin Asp Leu Glu Tyr Cys Gly Ala Ser lie 
_305 310 315 320 

Lys Lys Gly Glu Met Val Phe Leu Leu He Pro Ser Ala Leu Arg Asp 
325 \ 330 335 

Gly Thr Val . Phe Ser Arg Pro Asp Val Phe Asp Val Arg Arg Asp Thr 
340 345 350 

Gly Ala Ser Leu Ala Tyr Gly Arg Gly Pro His Val Cys Pro Gly. Val 
355 . 360 365 

Ser Leu Ala Arg Leu Glu Ala Glu He Ala Val Gly Thr lie Phe Arg 
370 375 380 

Arg Phe Pro Glu Met Lys Leu Lys Glu Thr Pro Val Phe Gly Tyr His 
385 390 395 400 

Pro Ala Phe Arg Asn lie Glu Ser Leu Asn Val lie Leu Lys Pro Ser 
405 410- 415 

Lys Ala Gly 



<210> 9 
<211> 607 
<212> PRT 

<213> Sorangium ceilulosum 
<400> 9 

Ala Ser Leu Asp Ala Leu Phe Ala Arg Ala Thr Ser Ala Arg Val Leu 
1 5 10 15 

Asp Asp Gly His Gly Arg Ala . Thr Glu Arg His Val Leu Ala Glu Ala 
20 25 30 

Arg Gly lie Glu Asp Leu Arg Ala Leu Arg Glu His Leu Arg lie Gin 
35 40 45 

Glu Gly Gly Pro Ser Phe His Cys Met Cys Leu Gly Asp Leu Thr Val 
50 55 60 

Glu Leu Leu Ala His Asp Gin Pro Leu Ala Ser lie Ser Phe His His 
65 70 75 80 

Ala Arg Ser Leu Arg His Pro Asp Trp Thr Ser Asp Ala Met Leu Val 
85 90 95 

Asp Gly Pro. Ala Leu Val Arg Trp Leu Ala Ala Arg Gly Ala Pro Gly 
100 105 110 

Pro Leu Arg Glu Tyr Glu Glu Glu Arg Glu Arg Ala Arg Thr Ala Gin 
115, 120 125 

Glu Ala Arg Arg Leu Trp Leu Ala Ala Ala Pro Pro Cys Phe Ala Pro 
130 135 140 
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Asp Leu Pro Arg Phe Glu Asp Asp Ala Asn Gly Leu Pro Leu Gly Pro 
,145 150 155 160 

Met Ser Pro Glu Val Ala Glu Ala .Glu Arg Arg Leu Arg Ala Ser Tyr 
165 170 175 

Ala Thr Pro Glu Leu Ala Cys Ala Ala Leu Leu Ala Trp Leu Gly Thr 
180 185 190 

Gly Ala Gly Pro Trp Ser Gly Tyr Pro Ala Tyr Glu Met Leu Pro Glu 
195 200 205 

Asn Leu Leu Leu Gly Phe Gly Leu Pro Thr Ala He Ala Ala Ala Ser 
210 215 220 

Ala Pro Gly Thr Ser Glu Ala Ala Leu Arg Gly Ala Ala Arg Leu Phe 
225 230 235 240 

Ala Ser Trp Glu Val Val Ser Ser Lys Lys Ser Gin Leu Gly Asn He 
245. 250 255 

Pro Glu Ala Leu Trp Glu Arg Leu Arg Thr He Val Arg Ala Met Gly 
260 265 270 

Asn Ala Asp Asn Leu Ser Arg Phe Glu Arg Ala Glu Ala He Ala Ala 
275 280 285 

Giu Val Arg Arg Leu Arg Ala Gin Pro Ala Pro Phe Ala Ala Gly Ala 
290 295 300 

Gly Leu Ala Val Ala Gly Val Ser Ser Ser Gly Arg Leu Ser Gly Leu 
305 310 315 320 

Val Thr Asp Gly Asp Ala Leu Tyr Ser Gly Asp Gly Asn Asp He Val 
325 330 335 

Met Phe Gin Pro Gly Arg He Ser Pro Val Val Leu Leu Ala Gly Thr 
340 345 350 

Asp Pro Phe Phe Glu Leu Ala Pro Pro Leu Ser Gin Met Leu Phe Val 
355 360 365 

Ala His Ala Asn Ala Gly Thr lie Ser Lys Val Leu Thr Glu Gly Ser 
370 375 380 

Pro Leu lie Val Met Ala Arg Asn Gin Ala Arg Pro Met Ser Leu Val 
335 390 395 400 

His Ala Arg Gly Phe Met Ala Trp Val Asn Gin Ala Met Val Pro Asp 
405 410 415 

Pro Glu Arg Gly Ala Pro Phe Val Val Gin Arg Ser Thr lie Met Glu 
420 425 430 

Phe Glu His Pro Thr Pro Arg Cys Leu His Glu Pro Ala Gly Ser Ala 
435 440 445 

Phe Ser Leu Ala Cys Asp Glu Glu His Leu Tyr Trp Cys Glu Leu Ser 
450 455 . . 460 

Ala Gly Arg Leu Glu Leu Trp Arg His Pro His His Arg Pro Gly Ala 
465 470 475 480 

Pro Ser Arg Phe Ala Tyr Leu Gly Glu His Pro lie Ala Ala Thr Trp 
485 490 495 
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Tyr Pro Ser Leu Thr Leu Asn Ala Thr His Val Leu Trp Ala Asp Pro 
500 505 510 

Asp Arg Arg Ala lie. Leu Gly Val Asp Lys Arg Thr Gly Val Glu Pro 
515 520 525 

lie Val Leu Ala Glu Thr Arg His Pro Pro. Ala His Val Val Ser Glu 
530 535 540 

Asp Arg Asp lie Phe Ala Leu Thr Gly Gin Pro Asp Ser Arg Asp Trp 
545 550 555 560 

His Val Glu His He Arg Ser Gly Ala Ser Thr Val Val Ala Asp Tyr 
.565 570 575 

Gin Arg Gin Leu Trp Asp Arg Pro Asp Met Val Leu Asn Arg Arg Gly 
580 585 .590 

Leu Phe Phe Thr Thr Asn Asp Arg He Leu Thr Leu Ala Arg Ser 
595 • 600 . 605 



<210> 10 

<211> 423 

<2i2> PR? 

<223> Sorangium cellulosum 

<400> 10 

Met Gly Ala Leu lie Ser Val Ala Ala Pro Gly Cys Ala Leu Gly Gly 
1 . 5 . 10 15 

Ala Glu Glu Glu Gly Gin Pro Gly Gin Asp Ala Gly Ala Gly Ala Leu 
20 25 30 

Ala Pro Ala Arg Glu Val Met Ala Ala Glu Val Ala Ala Gly Gin Met 
35 40 45 

. Pro Gly Ala Val Trp Leu Val Ala Ara Gly Asp Asd Val His Val Asp 
50 55 60 

Ala Val Gly Val Thr Glu Leu Gly Gly Ser Ala Pro Met Arg Arg Asp 
65 70 75 80 

Thr lie Phe Arg lie Ala Ser Met Thr Lys Ala Val Thr Ala Thr Ala 
85 90 95 

Val Met Met Leu Val Glu Glu Gly Lys Leu Asp Leu Asp Ser Pro Val 
100 105 . 110 

Asp Arg Trp Leu Pro Glu Leu Ala Asn Arg Lys Val Leu Ala Arg He 
115 120 125 

Asp Gly Pro He Asp Glu Thr Val Pro Ala Glu Arg Pro lie Thr Val 
. . .130 135 140 . 

Arg Asp Leu Met Thr Phe Thr Met Gly Phe Gly lie Ser Phe Asp Ala 
145 150 155 160 

Ser Ser Pro lie Gin Arg Ala He Asp Glu Leu Gly Leu Val Asn Ala 
165 170 175 

Gin Pro Val Pro Met Thr Pro His Gly Pro Asp Glu Trp He Arg Arg 
180 185 190 
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Leu Gly Thr Leu Pro Leu Met His Gin Pro Gly Ala Gin Trp Met Tyr 
195 200 . 205 

Asn Thr Gly Ser Leu Val Gin Gly Val Leu Val Gly Arg Ala Ala Asp 
210 215 220 

Gin Gly Phe Asp Ala Phe Val Arg Glu Arg He Leu Ala Pro Leu Glv 
225 230 235 240 

Met Arg Asp Thr Asp Phe His Val Pro Ala Asp Lys Leu Ala Arg Phe 
245 250 255 

Ala Gly Cys Gly Tyr Phe Thr Asp Glu Gin Thr Gly Glu Lys Thr Arg 
260 265 270 

Met Asp Arg Asp Gly Ala Glu' Ser Ala Tyr Ala Ser Pro Pro Ala Phe 
275 280 285 

Pro Ser Gly Ala Ala Gly Leu Val Ser Thr Val Asp Asp Tyr Leu Leu 
290 295 300 

Phe Ala Arg Met Leu Met Asn Gly Gly Val His Glu Gly Arg Arg Leu 
305 310 315 320 

Leu Ser Ala Ala Ser Val Arg Glu Met Thr Ala Asp His Leu Thr Pro 
325 330 335 

Ala Glr. Lys Ala Ala Ser Ser Phe Phe Pro Gly Phe Phe Glu Thr His 
340 345 350 

Gly Trp Gly Tyr Gly Met Ala Val Val Thr Ala Pro Asp Ala Val Ser 
355 360 365 

Glu Val Pro Gly Arg Tyr Gly Trp Asp Gly Gly Phe Gly Thr Ser Trp 
370 375 380 

lie Asn Asp Pro Gly Arg Glu Leu lie Gly lie Val Met Thr Gin Ser 
383 390 395 400 

Ala Gly Phe Leu Phe Ser Gly Ala Leu Glu Arg Phe Trp Arg Ser Val 
405 410 415 

Tyr Val Ala Thr Glu Ser Ala 
420 



<210> 11 
<211> 713 
<212> PRT 

<213> . Sprangium celiulosum 
<400> 11 

Met His Gly Leu Thr Glu Arg Gin Val Leu Leu Ser Leu Val Thr Leu 
5 10 15 

Ala Leu lie Leu Val Thr Ala Arg Ala Ser Gly Glu Leu Ala Arg Arg 
20 25 30 

Leu Arg Gin Pro Glu Val Leu Gly Glu Leu Phe Gly Gly Val Val Leu 
35 40 45 

Gly Pro Ser Val Val Gly Ala Leu Ala Pro Gly Phe His Arg Ala Leu 
50 55 60 

Phe Gin Glu Pro Ala Val Gly Val Val Leu Ser Gly He Ser Trp He 
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65 70 75 80 

Gly Ala Leu Leu Leu Leu Leu Met Ala Gly lie Glu Val Asp Val Gly 
85 90 35 

lie Leu Arg Lys Glu Ala Arg Pro Gly Ala Leu Ser Ala Leu Gly Ala 
100 .105 110 

lie Ala Pro Pro Leu Ala Ala Gly Ala Ala Phe Ser Ala Leu Val Leu 
115 120 125 

Asp Arg Pro Leu Pro Ser Gly Leu Phe Leii Gly lie Val Leu Ser Val 
130 135 140 

Thr Ala Val Ser Val lie Ala Lys Val Leu lie Glu Arg Glu Ser Met 
145 150 . 155 . . 16.0 

Arg Arg Ser Tyr Ala Gin Val Thr Leu Ala Ala Gly Val Val Ser Glu 
165 .170 175 

Val Ala Ala Trp Val Leu Val Ala Met Thr Ser Ser Ser Tyr Gly Ala 
180 . , 185 190 

Ser Pro Ala Leu Ala Val Ala Arg Ser Ala Leu Leu Ala Ser Gly Phe 
195 200 205 

Leu Leu Phe Mec Val Leu Val Gly Arg Arg Leu Thr His Leu Ala Met 
210 215 .220 

Arg Trp Val Ala Asp Ala Thr Arg Val Ser Lys Gly Gin Val Ser Leu 
225 230 235 240 

Val Leu Val Leu Thr Phe Leu Ala Ala Ala Leu Thr Gin Arg Leu Gly 
245 250 255 

Leu His Pro Leu Leu Gly Ala Phe Ala Leu Gly Val Leu Leu Asn Ser 
260 265 270 

Ala Pro Arg Thr Asn Arg Pro Leu Leu Asp Gly Val Gin Thr Leu Val 
275 280 285 

Ala Gly Leu Phe Ala Pro Val Phe Phe Val Leu Ala Gly Met Arg Val 
290 295 300 

Asp Val Ser Gin Leu Arg Thr Pro Ala Ala Trp Gly Thr Val Ala Leu 
305 310 315 320 

Leu Leu Ala Thr Ala Thr Ala Ala Lys Val Val Pro Ala Ala Leu Gly 
325 330 335 

Ala Arg Leu Gly Gly Leu Arg Gly Ser Glu Ala Ala Leu Val Ala Val 
340 . 345 .350 

Gly Leu Asn Met Lys Gly Gly Thr Asp Leu He Val Ala He Val Gly 
355 360 365 

Val Glu Leu Gly Leu Leu Ser Asn Glu Ala Tyr Thr Met Tyr Ala Val 
370 375 380 

Val Ala Leu Val Thr Val Thr Ala Ser Pro Ala Leu Leu He Trp Leu 
385 390: 395 - 400 

Glu Lys Arg Ala Pro Pro Thr Gin Glu Glu Ser Ala Arg Leu Glu Arg 
405 410 415 
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Glu Glu Ala Ala Arg Arg Ala Tyr lie Pro Gly Val Glu Arg He Leu 
420 425 430 

Val Pro lie Val Ala His Ala Leu Pro Gly Phe Ala Thr Asp lie Val 
435 440 445 

Glu Ser lie Val Ala Ser Lys Arg Lys Leu Gly Glu Thr Val Asp lie 
450 455 460 

Thr Glu Leu Ser Val Glu Gin Gin Ala Pro Gly Pro Ser Arg Ala Ala 
465 470 475 480 

Gly Glu Ala Ser Arg Gly Leu Ala Arg Leu Gly Ala Arg Leu Arg Val 
485 490 495 

Gly lie Trp Arg Gin Arg Arg Glu Leu Arg Gly Ser lie Gin Ala lie 
500 505 ' 510 

Leu Arg Ala Ser Arg Asp His Asp Leu Leu Val lie Gly Ala Arg Ser 
515 520 525 

Pro Ala Arg Ala Arg Gly Met Ser Phe Gly Arg Leu Gin Asp Ala lie 
530 535 540 

Val Gin Arg Ala Glu Ser Asn Val Leu Val Val Val Gly Asp Pro Pro 
545 550 555 L 560 

Ala Ala Glu Arg Ala Ser Ala Arg Arg He Leu Val Pro He He Gly 
565 570 575 

Leu Glu Tyr Ser Phe Ala Ala Ala Asp Leu Ala Ala His Val Ala Leu 
580 585 590 

Ala Trp Asp Ala Glu Leu Val Leu Leu Ser Ser Ala Gin Thr Asp Pro 
595 600 60S 

Gly Ala Val Val Trp Arg Asp Arg Glu Pro Ser Arg Val Arg Ala Val 
610 615 620 

Ala Arc Ser Val Val Asp Glu Ala Val Phe Arg Gly Arg Arg Leu Gly 
625 630 635 " 640 

Val Arg Val Ser Ser Arg Val His Val Gly Ala His Pro Ser Asp Glu 
.645 650 655 

He Thr Arg Glu Leu Ala Arg Ala Pro Tyr Asp Leu Leu Val Leu Gly 
660 665 670 

Cys Tyr Asp His Gly Pro Leu Gly Arg Leu Tyr Leu Gly Ser Thr Val 
675 680 685 

Glu Ser Val Val Val Arg Ser Arg Val Pro Val Ala Leu Leu Val Ala 
690 .; 695 700 

His Gly Gly Thr Arg Glu Gin Val Arg 
705 710 



<210> 12 
<211> 126 
<212> PRT 

<213> Sorangium cellulosum 
<400> 12 

Met Asp Lys Pro He Gly Arg Thr Arg Cys Ala He Ala Glu Gly Tyr 



WO 99/66023 



PCT/EP99/04171 



-79- 



1 5 10 15 

lie Pro Gly Gly Ser Asn Gly Pro Glu Pro Gin Met Thr Ser His Glu 
20 25 30 

Thr Ala Cys Leu Leu Asn Ala Ser Asp Arg Asp Ala Gin Val Ala lie 
35 . 40 : 45 

Thr Val Tyr Phe Ser Asp Arg Asp Pro. Ala Gly Pro Tyr Arg Val Thr 
50 55 60 

Val Pro Ala Arg Arg Thr Arg His Val Arg Phe Asn Asp Leu Thr Glu 
65 .70 75 80 

Pro Glu Pro lie Pro Arg Asp Thr Asp Tyr Ala Ser Val lie Glu Ser 
85 90 95 

Asp Ala Pro He Val Val Gin His Thr Arg Leu Asp Ser Arg Gin Ala 
100 105 110 

Glu Asn Ala Leu Leu Ser Thr He Ala Tyr Thr Asp Arg Glu 
115 120 125 



<210> 13 
<211> 149 
<212> PRT 

<213> Sorangium cellulbsum 
. <400> 13 

Met Lys His Val Asp Thr Gly Arg Arg Phe Gly Afg Arg He Gly. His 
i 5 10 15 

Thr Leu Gly Leu Leu Ala Ser Met Ala Leu Ala Gly Cys Gly Gly Pro 
20 25 30 

Ser Glu Lys Thr Val Gin Gly Thr Arg Leu Ala Pro Gly Ala Asp Ala 
35 40 . .45 

Arg Val Thr Ala Asp Val Asp Pro Asp Ala Ala Thr Thr Arg Leu Ala 

50 55 . 60. 

Val Asp Val Val His Leu Ser Pro Pro Glu Arg Leu Glu Ala Gly Ser 
65 70 75 80 

Glu Arg Phe Val Val Trp Gin Arg Pro Ser Pro Glu Ser Pro Trp Arg 
85 ,90 95 

Arg Val Gly Val Leu Asp Tyr Asn Ala Asp Ser Arg Arg Gly Lys Leu 
.100 105 . 110 

Ala Glu Thr Thr Val Pro Tyr Ala Asn Phe Glu Leu Leu lie Thr Ala 
115 120 125 

Glu Lys Gin Ser Ser Pro Gin Ser Pro Ser Ser Ala Ala Val lie Gly 
130.. 135 140 

Pro Thr Ser Val Gly 
145 



<210> 14 
<2li> 184 
<212> PRT 

<213> Sorangium cellulosum 



WO 99/66028 



PCT/EP99/04171 



•80- 



<400> 14 

Val Thr Ser Glu Glu Val Pro Gly Ala Ala Leu Gly Ala Gin Ser Ser 

5 , 10 15 

Leu Val Arg Ala Gin His Ala Ala Arg His Val Arg Pro Cys Thr Arg 
20 25 30 

Ala Glu Glu Pro Pro Ala Leu Met His Gly Leu Thr Glu Arg Gin Val 
• 40 45 

Leu Leu Ser Leu Val Ala Leu Ala Leu Val Leu Leu Thr Ala Arg Ala 
50 55 60 

Phe Gly Glu Leu Ala Arg Arg Leu Arg Gin Pro Glu Val Leu Gly Glu 
65 , 70 75 80 

Leu Phe Gly Gly Val Val Leu Gly Pro Ser Val Val Gly Ala Leu Ala 
85 90 95 

Pro Gly Phe His, Arg Val Leu Phe Gin Asp Pro Ala Val Gly Val Val 
100 .105 no 

Leu Ser Gly lie Ser Trp lie Gly Ala Leu Val Leu Leu Leu Met Ala 
llb 120 125 

Giy lie Glu Val Asp Val Ser lie Leu Arg Lys Glu Ala Arg Pro Gly 
- JJ 135 i4o 

Ala Leu Ser Ala Leu Gly Ala lie Ala Pro Pro Leu Arg Thr Pro Gly 
-*° 150. 155 16 Q 

Pro Leu Val Gin Arg Met Gin Gly Ala Phe Thr Trp Asp Leu Asp Val 
165 170 175 

Ser Pro Arg Arg Ser Ala Gin Ala 
180 



<210> 15 
<21i> 145 
<212> PET 

<2I3> Sorangiunr cellulosum 
<400> 15 

Val Asn Ala Pro Cys Met Arg Cys Thr Ser Gly Pro Gly Val Arg Ser 
5 10 



15 



Gly Gly Ala lie Ala Pro Ser Ala Glu Ser Ala Pro Gly Arg Ala Ser 

20 . 25 30 

Leu Arg Arg Met Leu Thr Ser Thr Ser He Pro Ala Met Ser Ser Arg 
-33 40 45 

Thr Ser Ala Pro He Gin Glu Met Pro Glu Ser Thr Thr Pro Thr Ala 

55 60 

Gly Ser Trp Lys Arg Thr Arg Trp Asn Pro Gly Ala Ser Ala Pro Thr 
65 70 75 80 

Thr Asp Gly Pro Ser Thr Thr Pro Pro Lys Ser Ser Pro Ser Thr Ser 
85 90 95 

Gly Trp Arg Ser Arg Arg Ala Ser Ser Pro Lys Ala Arg Ala Val Arg 
100 105 no 
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Arg Thr Ser Ala Arg Ala Thr Ser Glu Ser Arg Thr Cys Arg Ser Val 
115 120 125 

Arg Pro Cys lie Arg Ala Gly Gly Ser Ser Ala Arg Val Gin Gly Arg 
130 135 140 

Thr 
145. 



<210> 16 
<211> 185 
<212> PRT 

<213> Sorangium cellulosuin 
<400> 16 . 

Val Leu Ala Pro Pro Ala Asp lie Arg Pro Pro Ala Ala Ala Gin Leu 
1 5 10 15 

Glu Pro Asp Ser Pro Asp Asp Glu Ala Asp Glu Ala Asp Glu Ala Leu 
20 25 30 

Arg Pro Phe Arg Asp Ala lie Ala Ala Tyr Ser Glu Ala Val Arg Trp 
35 40 45 

Ala Glu Ala Ala Gin Arg. Pro Arg Leu Glu Ser Leu Val Arg Leu Ala 
50 55 60 

lie Val Arg Leu Gly Lys Ala Leu Asp Lys Val Pro Phe Ala His Thr 
65 70 75 80 

Thr Ala Gly Val Ser Gin lie Ala Gly Arg Leu Gin Asn Asp Ala Val 
85 90 95 

Trp Phe Asp Val Ala Ala Arg Tyr Ala Ser Phe Arg Ala Ala Thr Glu 
100 105 110 

His Ala Leu Arg Asp Ala Ala Ser Ala Met Glu Ala Leu Ala Ala Gly 
115 . 120 125 

Pro Tyr Arg Gly Ser Ser Arg Val Ser Ala Ala Val Gly Glu Phe Arg 
130 135 140 

Gly Glu Ala Ala Arg. Leu His Pro Ala Asp Arg Val Pro Ala Ser Asp 
145 .150 . 155 160 

Gin Gin lie Leu Thr Ala Leu Arg Ala Ala Glu Arg Ala Leu He Ala 
165 170 175 

Leu Tyr Thr Ala Phe Ala Arg Glu Glu 

. 180 185 : 



<210> 17 - 
<211> 146 
<212> PRT 

<213> Sorangium cellulosum . . 

<400> 17 

Met Ala Asp Ala Ala Ser Arg Ser Ala Cys Ser Val Ala Ala Arg Lys 
1 5 10 15 

Leu Ala . Tyr Arg Ala Ala Thr Ser Asn Gin Thr Ala Ser . Phe Trp Ser 
20 25 30 
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Leu Pro Ala lie Trp Glu Thr Pro Ala Val yal Cys Ala Lys Gly Thr 
35 40 45 

Leu Ser Ser Ala Leu Pro Ser Arg Thr lie Ala Ser Arg Thr Arg Leu 
50 55 60 

Ser Ser Arg Gly Arg Cys Ala Ala Ser Ala His Arg Thr Ala Ser Glu 
65 70 75 80 

Tyr Ala Ala He Ala Ser Arg Asn Gly Arg Ser Ala Ser Ser Ala Ser 
85 90 95 

Ser Ala Ser Ser Ser Gly Glu Ser Gly Ser Ser Trp Ala Ala Ala Gly 
100 105 110 

Gly Arg Met Ser Ala Gly Gly Ala Ser Thr Gly Glu Val Tyr Glu Gin 
115 120 125 

Ala Pro Arg Leu Arg Leu Ala Gin Ser Val Ala Ala Arg Arg Arg Asp 
130 135 140 

Pro Thr 
145 



<210> 18 
<211> 288 
<212> PRT 

,<213> Sorangium cellulosum 
<400> 18 

Val Thr Val Ser Ser Met Pro Arg Ser Trp Ser Ser Arg Val Arg Thr 
1 5 10 15 

Val Val Thr Ala Leu Gly Cys Ala Arg Arg Leu Ser Gly Ser lie Ser 
20 25 30 

Arg Leu Arg Arg His Pro Glu Ala Gly Arg Ala Pro Arg Ser Arg Leu 
35 40 45 

Arg Ala Trp Arg Arg Leu Pro Gin His He Ser Ser Pro Trp Arg His 
50 55 60 

Leu Pro Pro Gly Ala Arg Val Gly Thr Ser Cys Pro Ala Asp Arg Arg 
65 70 75 80 

He Leu Pro Ser His Arg Thr Ala Asp Leu Gly Thr Ser Gly Gly Thr 
85 90 95 

Leu Val Ala. Arg Met Ser Gly His Val Ala Arg Asn Pro His Ala Ala 
100 105 HO 

Val Leu Val Gly Asp Gly Ser Ala Arg Gly Arg Arg Arg Leu Ser Asn 
115 120 125 

Arg Arg Ala Glu Arg Arg Val Ser Asp Val Thr Cys Arg Glu Gly Gly 
130 135 140 

Glu Ala Met Gin Lys He Ala Gly Lys Leu Val Val Gly Leu He Ser 
145 150 155 i 60 

Val Ser Gly Met Ser Leu Leu Ala Ala Cys Gly Gly Glu Lys Arg Ser 
165 170 175 
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Gly Gly Glu Ala Gin Thr Pro Gly Gly Ala Gin Gly Glu Ala Pro Val 
180 185 190 

Pro Val Gly Ser. Ala Val Asp Ser lie Val Ala Ala Arg Cys Asp Arg 
195 200 205 

Glu Ala Arg Cys Asn Asn lie Gly Gin Asp Arg Glu Tyr Ser Ser Lys 
210. 215 220 

Asp Ala Cys Ser Asn Lys He Arg Ser Glu Trp Arg Asp Glu Leu Thr 
225 230 235 240 

Phe Gly Glu Cys Pro Gly Gly He Asp Ala Lys Gin Leu Asn Glu Cys 
245 .250 255 

Leu Glu Gly He Arg Asn Glu Gly Cys Gly Asn Pro Phe Asp Thr Leu 
260 265 270 . 

Gly Arg Val Val Ala Cys Arg Ser Ser Asp Leu Cys Arg Asp Ala Arg 
275 280 285 



<21C> 19 . 
<211> 288 
<212> PRT 

<213> Sorangium cellulosum 
<400> 19 . 

Val Thr Val Ser Ser Met Pro Arg Ser Trp Ser Ser Arg Val Arg Thr 
1 5 10 15 

Val Val Thr Ala Leu Gly Cys Ala Arg Arg Leu Ser Gly Ser lie Ser 
20 25 30 

Arg Leu Arg Arg His Pro Glu Ala Gly Arg Ala Pro Arg Ser Arg Leu 
35 40 45 

Arg Ala Trp Arg , Arg Leu Pro Gin His He Ser Ser Pro Trp Arg His 
50 55 60 

Leu Pro Pro Gly Ala Arg Val Gly Thr Ser Cys Pro Ala Asp Arg Arg 
65 70 75 80 

He Leu Pro .Ser His Arg Thr Ala Asp Leu Gly Thr Ser Gly Gly Thr 
85 90 95 

Leu Val Ala Arg Met Ser Gly His Val Ala Arg Asn Pro His Ala Ala 
100 105 110 

Val Leu Val Gly Asp Gly Ser Ala Arg Gly Arg Arg Arg Leu Ser Asn 
115 120 125 

Arg Arg Ala Glu Arg Arg Val Ser Asp Val Thr Cys Arg Glu Gly Gly 
130 135 140 

Glu Ala Met Gin Lys lie Ala Gly Lys Leu Val Val Gly Leu He Ser 
145 150 155 160 

Val Ser Gly Met Ser Leu Leu Ala Ala Cys Gly Gly Glu Lys Arg Ser 
165 170 175 

Gly Gly Glu Ala Gin Thr Pro Gly Gly Ala Gin Gly Glu Ala Pro Val 
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180 185 



190 



Pro Val Gly Ser Ala Val Asp Ser lie Val Ala Ala Arg cys Asp Arg 
195 200 205 

G1U 210 ^ A * ? *-* sil 2il Gly Gln Asp ^ g Giu ^Vr Ser Ser Lys 

Asp Ala Cys Ser Asn Lys He Arg Ser Glu Trp Arg Asp Glu Leu Thr 
225 230 235 240 

Phe Gly Glu Cys Pro Gly Gly lie Asp Ala Lys Gin Leu Asn Glu Cys 
245 250 255 

Leu Glu Gly He Arg Asn Glu Gly Cys Gly Asn Pro Phe Asp Thr Leu 
260 265 270 

Gly Arg Val Val Ala Cys Arg Ser Ser Asp Leu Cys Arg Asp Ala Arg 
275 280 285 



<210> 20 
<211> 155 
<212> PRT 

<213> Sorangiuro cellulosum 
<400> 20 

Met Asp Pro Arg Ala Arg Arg Glu Lys Arg Pro Ser Leu Leu Asp Ser 
1 5 10 15 

Arg Gly Arg Gin Pro Lys Arg Ser Gin Gin Gly Gly His Met Glu Lys 
20 25 30 

Pro lie Gly Arg Thr Arg Trp Ala lie Ala Glu Gly Tyr He Pro Gly 
35 40 45 

Arg Ser Asn Gly Pro Glu Pro Gin Met Thr Ser His Glu Thr Ala Cys 
50 55 60 

Leu Leu Asn Ala Ser Asp Arg Asp Ala Gin Val Ala lie Thr Val Tyr 
65 70 75 60 

Phe Ser Asp Arg Asp Pro Ala Gly Pro Tyr Arg Val Thr Val Pro Ala 
85 90 95 

Arg Arg Thr Arg His Val Arg Phe Asn Asp Leu Thr Glu Pro Glu Pro 
100 105 no 

He Pro Arg Asp Thr Asp Tyr Ala Ser Val lie Glu Ser Asp Val Pro 
115 120 125 

lie Val Val Gin His Thr Arg Leu Asp Ser Arg Gin Ala Glu Asn Ala 
130 135 140 

Leu He Ser Thr lie Ala Tyr Thr Asp Arg Glu 
145 150 155 

<210> 21 
<211> 156 
<212> PRT 

<213> Sorangium cellulosum 
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<400> 21 

- Val Arg Arg Ser Arg Trp Gin Met Lys His Val Asp . Thr Gly Arg Arg 

1 5" 10 . .15 

Val Gly Arg Arg lie Gly Leu Thr Leu Gly Leu Leu Ala Ser Met Ala 
20 .25 30 

Leu Ala Gly Gys Gly Gly Pro Ser Glu Lys lie Val Gin Gly Thr Arg 
35 40 • 45 

Leu Ala Pro Gly Ala Asp. Ala His Val Ala Ala Asp Val Asp Pro Asp 

50 55 ... 60 

Ala Ala Thr Thr Arg Leu Ala Val Asp Val Val His Leu Ser Pro Pro 
65 70 75 80 

Glu Arg lie Glu Ala Gly Ser Glu Arg Phe Val Val Trp Gin Arg Pro 
85 90 95 

Ser Ser Glu Ser, Pro Trp Gin Arg Val Gly Val Leu Asp Tyr Asn Ala 
100 105 110 

Ala Ser Arg. Arg Gly Lys Leu Ala Glu Thr Thr Val Pro His Ala Asn 
115 120 125 

Phe Glu Leu Leu lie Thr Val Glu Lys Gin Ser Ser Pro Gin Ser Pro 
130 135 14.0 

Ser Ser Ala Ala Val He Gly Pro Thr Ser Val Gly 
145 .150. 155 



<210> 22 

<211> 305 

<212> PRT 

. <213> Sorangium cellulosum 

<400> 22 

Met Glu Lys Glu Ser Arg He Ala lie Tyr Gly Ala He Ala Ala Asn 
1 5 10 15 

Val Ala He Ala Ala Val Lys Phe He Ala Ala Ala Val Thr Gly Ser 
20 25 ; 30 

Ser Ala Met Leu Ser Glii Gly Val His Ser Leu Val Asp Thr Ala Asp 
35 40. 45 

Gly Leu Leu Leu Leu Leu Gly Lys His Arg Ser Ala Arg Pro Pro Asp 
50 55 60 

Ala Glu His Pro Phe Gly His Gly Lys Glu Leu Tyr Phe Trp Thr Leu 
65 70 75 80 

He Val Ala lie Met lie Phe Ala Ala Gly Gly Gly Val Ser He Tyr 
85 90 . 95 

Glu Gly lie Leu His Leu Leu His Pro Arg Gin He Glu Asp Pro Thr 
100 105. 110 

Trp Asn Tyr Val Val Leu Gly Ala Ala Ala Val Phe Glu Gly Thr Ser 
115 120 125 

Leu He He Ser He His Glu Phe Lys Lys Lys Asp Gly Gin Gly Tyr 
130 135 140 
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Leu Ala Ala Met Arg Ser Ser Lys Asp Pro Thr Thr Phe Thr lie Val 
145 150 155 160 

Leu Glu Asp Ser Ala Ala Leu Ala Gly Leu Thr lie Ala Phe Leu Gly 
165 170 175 

Val Trp Leu Gly His Arg Leu Gly Asn Pro Tyr Leu Asp Gly Ala Ala 
180 185 . 190 

Ser lie Gly He Gly Leu Val Leu Ala Ala Val Ala Val Phe Leu Ala 
155 200 205 

Ser Gin Ser Arg Gly Leu Leu Val Gly Glu Ser Ala Asp Arg Glu Leu 
210 215 220 

Leu Ala Ala lie Arg Ala Leu Ala Ser Ala Asp Pro Gly Val Ser Ala 
225 230 235 240 

Val Gly Arg Pro Leu Thr Met His Phe Gly Pro His Glu Val Leu Val 
245 250 255 

Val Leu Arg lie Glu Phe Asp Ala Ala Leu Thr Ala Ser Gly Val Ala 
260 265 270 

Glu Ala He Glu Arg lie Glu Thr Arg lie Arg Ser Glu Arg Pro Asp 
275 280 285 

Val Lys His lie Tyr Val Glu Ala Arg Ser Leu His Gin Arg Ala Arg 
290 295 300 

A. la 



<210> 23 
<211> 135 
<212> PRT 

<213> Sorangium cellulosum 
<400> 23 

Val Gin Thr Ser Ser Phe Asp Ala Arg Tyr Ala Gly Cys Lys Ser Ser 
1 5 10 15 

Arg Arg lie Ala Arg Ser Gly Ser Ala Gly Ala Arg Ala Gly Arg Ala 
20 25 30 

His Glu Gly Ala Ala Ser Ala Gly Phe Glu Gly Gly Asp Val Met Arg 
35 40 45 

Lys Ala Arg Ala His Gly Ala Met Leu Gly Gly Arg Asp Asp Gly Trp 
50 55 60 

Arg Arg Gly Leu Pro Gly Ala Gly Ala Leu Arg Ala Ala Leu Gin Arg 
65 70 75 80 

Gly Arg Ser Arg Asp Leu Ala Arg Arg Arg Leu He Ala Ser Val Ser 
85 90 95 

Leu Ala Gly Gly Ala Ser Met Ala Val Val Ser Leu Phe Gin Leu Gly 
100 105 no 

He lie Glu Arg Leu Pro Asp Pro Pro Leu Pro Gly Phe Asp Ser Ala 
115 120 125 
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Lys Val. Thr Ser Ser Asp lie 
130 135 



<210> 24 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> . 

<223> Description of Artificial Sequence: universal 
reverse primer 

<400>- 24 

ggaaacagct atgaccatg 

<210> 25 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: universal 
forward primer 

<400> 25 

gtaaaacgac ggccagt 

<21C> 26 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
NH24 end "B" 

<400> 26 

gtgactggcg cctiggaatct gcatgagc 

<210> 27 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer NH2 
end -A- 

<400> 27 

agcgggagct tgctagacat tctgtttc 

<210> 28 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

. <223> Description of Artificial Sequence: PCR primer NH2 
end "B" 

<400> 28 

gacgcgcctc gggcagcgcc ccaa 



<210> 29 
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<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
pEP015-NH6 end "B" 

<400> 29 

caccgaagcg tcgatctggt ccatc 

<210> 30 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
PEP015H2.7 end "A" 

<400> 30 

cggtcagatc gacgacgggc tttcc 
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